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ABSTRACT 


This  report  describes  the  development  and  application 
of  a  program  to  forecast  important  air/ocean  parameters  using 
the  method(s)  of  model  output  statistics.  The  focus  of  this 
operationally  oriented  study  is  to  forecast  atmospheric 
marine  horizontal  visibility  using  a  discrete  analysis  of 
observed  visibility  and  the  Navy's  Operational  Global  Atmos¬ 
pheric  Prediction  System  (NOGAPS)  model  output  parameters. 

Three  strategies  (two  based  on  maximum-probability  and  one 
based  on  natural-regression)  are  compared  to  two  multiple 
linear  regression  methods.  The  primary  data  set  is  from  a 
North  Atlantic  Ocean  area  bounded  approximately  by  the  North 
American  coast  from  Norfolk,  Va.  to  St.  Johns,  Newfoundland, 
and  then  eastward  to  about  37.5°W.  Both  the  dependent  and 
independent  data  were  derived  from  the  same  basic  set.  New 
or  unfamiliar  concepts,  in  addition  to  the  primary  methodology, 
include  the  statistical  division  of  the  North  Atlantic  Ocean 
into  physically  homogeneous  areas,  two  new  threshold  models 
for  the  application  of  linear  regression  equations,  linear 
regression  based  upon  a  'decision-tree'  concept,  functional 
dependence  of  predictors  and  class  errors.  Results  show 
that  the  methodology  proposed  by  Preisendorfer  does  out 
perform  multiple  linear  regression. 
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I.  INTRODUCTION  AND  BACKGROUND 


Model  output  statistics  (MOS)  is  a  technique  whereby 
parameters  output  from  numerical  weather  prediction  models 
(predictors)  are  statistically  processed,  with  observed 
data,  to  produce  forecasts  of  one  of  the  following  cate¬ 
gories  of  parameters  (as  predictands) : 

a.  operationally  important  parameters  not  output  by  the 
numerical  prediction  model  (e.g.,  visibility,  cloud 
cover,  ceiling) . 

b.  model  output  parameters  whose  predictive  skill  is 
improved  (e.g.,  surface  wind,  temperature)  due  to 
correction  of  numerical  model  bias  and/or  scale. 

Historically,  the  methodology  has  consisted  of  generating 
empirical  equations  by  a  linear,  least-squares  regression 
model.  This  technique  is  used  by  both  the  National  Weather 
Service  and  the  United  States  Air  Force  Air  Weather  Service 
and  has  demonstrated  operationally  usable  skill  in  forecast¬ 
ing  numerous  weather  elements  at  locations  over  land 
throughout  the  world  [Best  and  Pryor,  19  83]  .  Attempts  by 
the  United  States  Navy  to  forecast  open-ocean  fog  and  visi¬ 
bility  using  linear  regression  equations  have  shown  skills 
of  marginal  operational  usefulness  but  exceeding  those  of 
persistence  and  climatology  [Aldinger,  1979;  Yavorsky,  1980; 
Selsor,  1980;  Koziara  et  al,  1983;  Renard  and  Thompson,  1984]. 


Presumably,  this  level  of  performance  is  due,  in  part,  to 
the  lack  of  'calibrated'  fog  and  visibility  observations. 
Shipboard  weather  observers  lack  sufficient  reference  points 
to  be  able  to  accurately  estimate  the  range  of  atmospheric 
visibility. 

In  the  spring  of  1983,  the  United  States  Navy  made  the 
decision  to  begin  development  of  a  MOS  program  to  forecast 
operational  air/ocean  parameters  over  the  oceans  of  the 
world.  Primarily,  because  of  the  importance  of  horizontal 
visibility  to  the  mariner,  this  parameter  was  elected  to  be 
the  initial  candidate.  However,  because  of  less-than-perfect 
prior  results  using  linear  regression  in  the  North  Pacific 
Ocean,  it  was  decided  to  investigate  other  methodologies 
to  determine  if  a  better  one  could  be  found. 

This  study  presents  statistical  methodologies  proposed  by 
Preisendorfer  (1983  a,b,c) .  Specifically,  three  strategies, 
two  based  on  maximum-probability  and  one  based  on  natural- 
regression,  are  further  developed,  tested  and  applied  to  sets 
of  model  output  parameters  from  both  the  North  Pacific  and 
North  Atlantic  Ocean  areas.  In  addition,  multiple  linear 
regression  is  applied  to  the  same  data.  Innovative  threshold 
techniques,  developed  by  Lowe  (1984a),  are  also  applied,  and 
methodologies  are  compared. 

In  the  following  discussion,  a  sufficient  number  of  terms 
and  symbols  are  defined  to  allow  readers  without  strong 
statistical  backgrounds  to  understand  the  results.  However, 


for  a  proper  understanding  of  the  Preisendorfer  (1983  a,b,c) 
methodology,  readers  are  encouraged  to  read  Appendix  A, 
which  contains  a  detailed  discussion.  Similarly,  details  on 
the  linear  regression  model  and  threshold  procedures  [Lowe, 
1984a)  are  to  be  found  in  Appendix  B. 


II.  OBJECTIVE  AND  APPROACH 


The  objective  of  this  study  is  to  determine  if  a  statis¬ 
tical  methodology,  applied  to  discrete  values  of  model 
output  and  derived  parameters,  can  improve  upon  the  fore¬ 
casting  of  horizontal  marine  atmospheric  visibility  when 
compared  to  linear  regression.  The  approach  is  as  follows: 

a.  define  categorical  groupings  of  visibility  which 
relate  to  operational  use  at  sea. 

b.  develop  and  apply  the  Preisendorfer  (1983  a,b,c) 
methodology  using  July  1979  North  Pacific  Ocean  data. 

c.  apply  the  methodology  developed  in  b.  above  to  June 
1983  North  Atlantic  Ocean  data. 

d.  compare  Preisendorfer  (1983  a,b,c)  results  to  those 
of  the  Lowe  (1984a)  linear  regression  approach  for 
the  North  Pacific  ,  and  North  Atlantic  Ocean  data  sets 


III.  DATA 


A.  VISIBILITY  OBSERVATIONS  AND  SYNOPTIC  CODE 

Visibility  observations  at  sea  are  reported  as  one  of 
ten  synoptic  codes,  ranging  from  90  (visibility  less  than 
50  m)  to  99  (visibility  equal  to  or  greater  than  10  km) . 
However,  in  view  of  the  inexactness  of  observing  and  record¬ 
ing  marine  visibility,  in  category  form,  and  the  further 
degradation  of  its  interpretation  by  users  in  forecasting, 
a  simplified  categorization  of  visibility  was  developed  as 
follows : 


category 

synoptic  code 

visibility  range 

I 

90-94 

<  2  km 

II 

95-96 

>  2  km  and  <  10  km 

III 

97-99 

>  10  km 

This  scheme  is  based  upon  the  following  operational 
criteria,  which  applies  when  observed  visibility  falls  below 
the  indicated  value: 

1.  10  km  (5  n  mi) — United  States  Navy  aircraft  carrier 
flight  recovery  operations  change  from  visual  to  con¬ 
trolled  approach  (Department  of  the  Navy,  1979] . 

2.  2  km  (1  n  mi) — sounding  of  reduced  visibility  signals 
for  all  vessels  operating  in  international  waters. 

(The  term  'reduced  visibility'  is  not  defined  in  the 
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International  Regulations  for  Preventing  Collisions  at 
Sea,  1972.  However,  United  States  Navy  Captains  and 
Merchant  Marine  Masters  generally  consider  it  to  be 
1  n  mi . ) 


B.  NORTH  PACIFIC  OCEAN  DATA 

The  data  from  the  North  Pacific  Ocean  are  described  by 
Selsor  (1980)  and  Koziara  et  al  (1983).  Only  the  July  1979 
model  initialization  (TAU00)  data  are  used,  consisting  of  19 
model  output  parameters  (MOP)  from  the  Northern  Hemisphere 
models  operational  in  1979,  namely,  the  Mass  Structure  Analy¬ 
sis,  the  Primitive  Equation  and  the  Marine  Wind  Models;  and 
one  climatological  visibility  parameter  from  the  National 
Oceanic  and  Atmospheric  Administration's  National  Climatic 
Data  Center  (NCDC) ,  Asheville,  North  Carolina.  Two  additional 
parameters  were  derived  from  this  set.  A  description  of  the 
parameters  is  found  in  Appendix  C. 


C.  NORTH  ATLANTIC  OCEAN  DATA 
1 .  Area 

The  North  Atlantic  Ocean,  from  0°  to  80 °N,  was 
divided  into  physically  homogeneous  areas  by  Lowe  (1984b) 
using  an  appropriate  cluster  analysis  technique.  The  primary 
area  used  in  this  study  is  identified  as  area  3W  on  Fig.  1, 
which  illustrates  the  North  Atlantic  Ocean  homoegeneous  areas. 
This  area  was  chosen  because  of  the  relatively  frequent 
occurrence  of  poor  visibility  as  compared  to  the  other  areas. 
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A  summary  of  visibility  frequencies,  for  each  homogeneous 
area  and  three  visibility  categories,  is  contained  in  Table  I. 

2 .  Time  Period 

Data  from  15  May  1983  through  15  July  1983  were 
combined  to  form  the  June  1983  data  set,  hereafter  referred 
to  as  FATJUNE.  FATJUNE  was  chosen  as  the  initial  data  set 
because  of  the  high  frequency  of  occurrence  of  poor  visi¬ 
bility  during  this  period.  In  order  to  maximize  the  credi¬ 
bility  of  visibility  observations,  1200  GMT  synoptic  ship 
report  data  were  used  exclusively  since  this  time  corresponds 
to  daylight  over  the  entire  area  of  study  during  FATJUNE. 

Model  output  parameter  data  (predictors)  at  1200  GMT 
model  output  time,  hereafter  referred  to  as  TAU00,  were  used 
in  the  development  of  the  Preisendorfer  (1983  a,b,c)  methodology, 
time  not  being  available  to  pursue  the  scheme  beyond  that 
stage.  Thus,  TAU00  represents  model  initialization  time. 

However,  the  term  'forecast'  will  be  used  throughout  this 
study  to  represent  the  estimate  of  visibility  at  this 
initialization  time. 

3 .  Synoptic  Weather  Reports 

All  synoptic  visibility  observations  (predictand 
data)  for  this  study  were  quality-control  checked  and  pro¬ 
vided  by  the  Naval  Oceanography  Command  Detachment  (NOCD) 
co-located  with  the  NCDC.  Those  furnished  observations  which 
contain  systematic  observer  error  or  are  suspect  or  obviously 
erroneous,  as  determined  from  the  data  quality  indicators, 
are  not  incorporated  in  the  final  data  set. 
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4 .  Predictor  Parameters 

Fifty  TAUOO  model  output  parameters  (MOP's)  (predic¬ 
tor  data)  were  provided  for  the  period  of  study  by  the  Fleet 
Numerical  Oceanography  Center  (FNOC) ,  Monterey,  California. 
These  parameters  are  from  their  current  operational  prediction 
model,  the  Navy  Operational  Global  Atmospheric  Prediction 
System  (NOGAPS).  All  MOP's  were  interpolated  from  model  grid 
coordinates  to  synoptic  ship  observation  positions  using  a 
linear  interpolation  scheme.  Of  the  50  parameters  provided, 
only  35  were  used  in  the  development  of  the  Preisendorfer 
(1983  a,b,c)  and  Lowe  (1984a)  methodologies,  the  remainder 
being  considered  as  either  having  little  likelihood  of 
importance  in  the  forecasting  of  visibility  or  not  usable 
due  to  the  lack. of  significant  digits  (which  were  lost  during 
the  transfer  from  FNOC  tapes  to  the  main  computer  center's 
mass  storage  data  system) .  Twelve  additional  parameters  were 
derived  from  the  interpolated  MOP's.  Seven  of  these  are 
equations  derived  from  a  linear  regression  model  which  will 
be  described  in  Chapter  V  and  Appendix  B.  Each  equation 
represents  an  estimate  of  the  visibility  category,  which  is 
used  as  a  predictor.  A  list  of  all  of  the  predictor  param¬ 
eters  is  provided  in  Appendix  D. 

D.  DEPENDENT/INDEPENDENT  DATA  SETS 

Due  to  the  limited  amount  of  data  available  to  this 
study  for  each  of  the  North  Atlantic  Ocean  homogeneous 
areas,  it  was  necessary  to  withhold  one-third  of  the 


observations  from  the  developmental  model  to  use  as  an  inde¬ 
pendent  data  set.  This  was  accomplished  by  the  use  of  a 
counter  and  transfer  statement  in  the  computer  programs  which 
prevented  every  third  observation  from  entering  the  develop¬ 
mental  computations.  To  ensure  that  the  dependent  and  inde¬ 
pendent  data  were  representative  of  the  same  population,  a 
95%  confidence  interval  for  proportions  [Miller  and  Freund, 
1977]  was  established  from  the  entire  data  set,  for  each 
visibility  category,  and  the  dependent  and  independent  data 
sets  were  constrained  to  have  visibility  frequencies  within 
these  established  confidence  intervals.  This  same  procedure 
was  applied  to  the  North  Pacific  Ocean  data  for  consistency  of 
method.  Table  II  summarizes  the  dependent  and  independent 
data  for  both  the  North  Atlantic  Ocean  and  North  Pacific 
Ocean  data  sets . 
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IV.  PRELIMINARY  EXPERIMENTS 

A.  TERMS  AND  SYMBOLS 

The  terms  and  statistical  symbols  defined  below  will  be 
used  throughout  the  remainder  of  this  report.  The  formal 
mathematical  definitions  can  be  found  in  Appendices  A  and 
E. 

1.  Maximum-probability  strategy — choosing  forecast 
visibility  categories  based  upon  the  highest  conditional 
probabilities  of  visibility  within  a  predictor  interval. 

2.  MAXPR0B1 — designation  of  the  maximum-probability 
strategy  in  which  ties  of  the  highest  conditional 
probabilities  in  a  predictor  interval  are  resolved  by 
the  generation  of  a  random  number. 

3.  MAXPR0B2 — designation  of  the  maximum-probability 
strategy  in  which  ties  of  the  highest  conditional 
probabilities  in  a  predictor  interval  are  resolved  by 
assigning  the  lowest  visibility  category,  of  those 
tied,  as  the  forecast  category. 

4.  Natural-regression  strategy — choosing  forecast  visi¬ 
bility  categories  based  upon  the  statistical  average 
of  the  conditional  probabilities  of  visibility  within 
a  predictor  interval. 

5.  ag — the  probability  of  a  zero-class  visibility  category 
forecast  error  (e.g.,  if  visibility  category  I  is  fore¬ 
cast,  it  is  also  observed) . 
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6.  — the  probability  of  a  one-class  visibility  category 
forecast  error  (e.g.,  if  visibility  category  I  is 
forecast  and  category  II  is  observed) . 

7.  a2~-the  probability  of  a  two-class  visibility  category 
forecast  error  (e.g.,  if  visibility  category  I  is 
forecast  and  category  III  is  observed) . 

8.  CE — class  error  parameter  defined  as  a^  +  2a2,  used  to 
identify  the  first  predictor. 

9.  PP — the  potential  predictability  of  visibility  by 
any  given  predictor. 

10.  FD — the  functional  dependence  of  one  predictor  on 
another.  This  is  a  measure  of  functional  dependence 
of  a  statistical  kind  and  not  of  the  deterministic 
kind.  The  term  'functional  dependence'  is  used  by 
Preisendorfer  (1983c)  and;  being  sufficiently  descrip¬ 
tive  of  the  concept,  it  will  be  used  herein. 

11.  R SS  FD--root  sum  squared  FD.  The  functional  dependence 
of  a  predictor  on  all  predictors  already  included  in 
the  developmental  model.  It  is  equal  to  the  square- 
root  of  the  sum  of  the  squares  of  the  individual  FD's. 

12.  TSl — threat  score  for  visibility  category  I  computed 
from  a  contingency  table. 

13.  ATSl — adjusted  threat  score  for  visibility  category 

I  which  removes  the  influence  of  the  data  set  category 
frequency. 
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14.  AAO — adjusted  .  A  contingency  table  statistic 

which  removes  the  influence  of  the  most  frequent  visi¬ 
bility  category  in  a  set  of  data  (similar  to  a  nor¬ 
malized  value)  . 

15.  EPI — equally  populous  predictor  interval  used  to 
discretize  the  predictors. 

B.  COMPUTER  PROGRAMS 

Four  computer  programs  were  developed  to  test  the 
proposed  Preisendorfer  (1983  a,b,c)  methodology.  The 
programs  are  on  file  in  the  Department  of  Meteorology,  Naval 
Postgraduate  School,  Monterey,  California,  93943. 

1.  A  program  to  compute  ag ,  a^,  CE  and  PP  for  all  predic¬ 
tors,  all  strategies  (MAXPR0B1,  MAXPR0B2  and  Natural- 
Regression)  and  a  single  number  of  EPI's.  Statistics 
for  the  three  strategies  are  based  upon  the  same  pre¬ 
dictor  (s)  rather  than  the  best  predictor (s)  for  each 
strategy.  It  was  determined  during  program  development, 
and  will  be  shown  in  Chapter  VI,  that,  in  general,  each 
of  the  strategies  chose  the  same  predictor (s) . 

2.  A  program  to  compute  FD  for  all  predictors,  on  a  given 
predictor,  for  a  given  number  of  EPI's,  and  to  compute 
the  upper  5%  critical  value  (FD(96))  by  Monte-Carlo 
means  (Appendix  A) . 

3.  A  program  to  construct  contingency  tables  and  to  com¬ 
pute  skill  and  threat  scores,  for  both  the  dependent 
and  independent  data. 


4.  A  program  to  generate  100  random  data  sets,  from  the 
marginal  probabilities  of  the  predictor (s)  in  the 
developmental  model,  and  to  compute  upper  and  lower 
5%  critical  values  for  a^  and  a^  to  be  used  for  test¬ 
ing  the  significance  of  the  results  from  the  Preisen- 
dorfer  (1983  a,b)  methodology  against  chance. 

C.  BEHAVIOR  OF  aQ  AND  THREAT  SCORES 

Before  attempting  a  formal  application  of  the  Preisen- 
dorfer  (1983  a,b,c)  methodology,  it  was  considered  prudent 
to  investigate  the  behavior  of  certain  statistics  as  the 
number  of  equally  populous  predictor  intervals  was  changed 
and  as  new  predictors  were  added.  It  was  found,  during 
program  testing  and  before  a  formal  procedure  had  been  estab' 
lished,  that  the  independent  data  threat  score  of  visibility 
category  I  (TSl)  generally  showed  higher  values  than  other 
threat  scores  (TS2,  TS12)  for  the  independent  data.  There¬ 
fore,  it  was  decided  that  the  dependent  and  independent  data 
and  TSl  scores  would  be  compared.  The  statistic  aQ  was 
chosen  because  it  is  the  singularly  most  important  scoring 
parameter  in  the  Preisendorfer  methodology. 

The  experiment  consisted  of  choosing  the  first  predictor 
as  that  one  which  gave  the  highest  aQ  value  when  divided 
into  ten  equally  populous  intervals.  Once  this  predictor 
was  chosen,  dependent  and  independent  data  Sq  and  TSl  scores 
were  computed  for  each  number  of  intervals  as  the  number  was 
varied  from  two  to  100.  Prior  to  proceeding  to  the  next 


step,  the  number  of  intervals  which  gave  the  highest  indepen¬ 
dent  data  TSl  score  was  identified  and  the  first  predictor 
was  held  at  this  number  of  intervals  for  the  remainder  of 
the  experiment. 

Subsequent  predictors  were  chosen  by  both  a  maximum  ag 
test  and  a  functional  dependence  test.  As  each  subsequent 
predictor  was  identified,  its  number  of  equally  populous 
intervals  was  varied  from  two  to  50  (or  less,  as  the  maximum 
array  size  was  set  at  120,000).  The  number  of  equally  popu¬ 
lous  intervals  giving  the  highest  independent  data  TSl  was 
identified  and  held  fixed  for  the  following  stage.  This  proce¬ 
dure  was  repeated  until  either  six  predictors  were  used  or 
until  a  new  predictor  addition  did  not  allow  the  comparison 
of  at  least  intervals  two  through  ten,  due  to  computer 
storage  limitations.  It  should  be  noted  here  that  all  of 
the  North  Atlantic  Ocean  parameters,  not  including  linear- 
regression  equations,  were  used  in  these  experiments  and, 
subsequently,  some  parameters  were  removed  from  consideration 
(Appendix  D)  . 

1.  Maximum  a^  Method 

The  first  NOGAPS  predictor  selected  was  SMF  which 
was  varied  from  two  to  100  EPI's  (Fig.  2a)  and  the  highest 
TSl  score  was  obtained  with  six  intervals.  The  second  pre¬ 
dictor  chosen,  when  SMF  was  held  at  six  intervals  and  all 
others  at  ten,  was  DTDP  which  produced  the  highest  ag  value 
for  two  predictors.  Holding  SMF  at  six  intervals,  DTDP  was 
varied  from  two  to  50  intervals  (Fig.  2b)  and  the  highest 
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TSl  score  was  obtained  at  20  intervals.  Anticipating  problems 
with  the  subsequent  array  size  with  respect  to  the  number  of 
predictors  which  could  be  included,  the  secondary  TSl  maximum 
at  16  intervals  was  used  for  further  stepping.  The  third  and 
subsequent  predictors  and  their  optimum  interval  sizes  were 
PS  at  12  (Fig.  2c) ,  UBLW  at  ten  (Fig.  2d)  and  V400  (Fig.  2e) . 
The  optimum  number  of  intervals  for  V400  was  not  germane  as 
no  further  stepping  was  done  after  this  step.  As  illustrated 
in  Fig.  2,  the  dependent  data  statistics  aymptotically  approach 
unity,  as  predictors  are  added,  while  the  independent  data 
statistics  (approximate  maximum  values:  aQ  =  .70,  TSl  =  .35) 
show  no  further  increase  after  the  third  predictor  is  includd, 
which  may  imply  a  limit  as  to  how  well  the  methodology  per¬ 
forms  on  this  particular  data  set. 

2 .  Functional  Dependence  Method 

As  functional  dependence  is  not  considered  until  after 
the  selection  of  the  first  NOGAPS  predictor,  Fig.  2a  is  also 
applicable  to  this  method.  Subsequent  predictors  were  chosen 
as  those  having  the  lowest  RSS  FD  using  ten  equally  populous 
intervals.  The  predictors  selected  and  their  optimum  inter¬ 
val  sizes,  for  the  TSl  score,  were  RH  at  three  (Fig.  3a) , 

DUDP  at  four  (Fig.  3b) ,  VOR925  at  two  (Fig.  3c) ,  ENTRN  at 
14  (Fig.  3d)  and  UBLW  (Fig.  3e)  which  was  the  last  predictor 
considered.  As  seen  for  the  maximum  a q method,  the  dependent 
data  statistics  asymptotically  approach  unity.  However  the 
independent  data  statistics  continue  to  grow  at  least  through 


the  addition  of  the  sixth  predictor  (approximate  maximum 
values:  a^  =  .71,  TSl  =  .38).  This  method  gave  better  results 

than  the  maximum  a^  method,  though  it,  too,  may  imply  a 
limit.  The  results  of  this  experiment  also  tend  to  show  a 
preferential  selection  of  a  small  number  of  EPI's,  for  best 
independent  data  TSl  score,  as  well  as  indicating  that  func¬ 
tional  dependence  is  a  relatively  good  choice  as  a  deciding 
factor  for  choosing  predictors. 

D.  BEHAVIOR  OF  FUNCTIONAL  DEPENDENCE 

Another  statistic  investigated  prior  to  the  formal 
application  of  the  Preisendorfer  (1983  a,b,c)  methodology 
was  the  distribution  of  functional  dependence  (FD)  calculated 
from  100  randomly  generated  data  sets.  The  FD  calculation  is 
based  upon  the  relationship  of  the  distribution  of  one  pre¬ 
dictor  to  another.  Because  the  predictors  are  divided  into 
the  same  number  of  EPI's  for  the  calculation,  the  probability 
of  a  randomly  generated  number  falling  into  any  given  inter¬ 
val  for  either  predictor  will  be  the  same.  Therefore,  the 
randomly  generated  FD  values  should  be  a  function  only  of 
the  number  of  intervals  and  the  number  of  data  cases  (subse¬ 
quent  randomly  generated  calculations,  during  the  formal 
application  of  the  methodology,  showed  this  to  be  true) . 

The  randomly  generated  FD  experiment  consisted  of  com¬ 
puting  the  mean,  upper  and  lower  5%  critical  values,  and  the 
standard  deviation  of  the  100  randomly  generated  values  for 
both  1526  observations  (as  in  the  North  Atlantic  Ocean  Area 


3W  dependent  data)  and  3682  observations  (as  in  the  North 
Pacific  Ocean  dependent  data)  and  a  comparison  of  the 
results.  As  illustrated  in  Fig.  4  the  FD  values  are  similar 
for  a  given  interval  size  differing  only  in  the  size  of  the 
confidence  interval  and  the  standard  deviation.  The  FD 
values  calculated  for  3682  observations  lie  totally  within 
the  upper  and  lower  5%  critical  values  for  1526  observations. 
Because  of  this  relationship,  future  FD(96)  values,  used  to 
qualitatively  determine  how  well  a  new  predictor  will  con¬ 
tribute  to  the  developmental  model,  can  be  obtained  by  read¬ 
ing  from  the  graph  rather  than  using  valuable  computer 
resources,  providing  the  number  of  equally  populous  intervals 
is  less  than  or  equal  to  ten. 
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V.  PROCEDURES 


A.  PREISENDORFER  METHODOLOGY 

1.  Determination  of  the  First  Predictor  in  Relation 
to  the  Number  of  Predictor  Intervals 

A  matter  not  considered  in  Preisendorfer  (1983  a,b,c) 
is  how  to  chose  an  optimum  number  of  equally  populous  pre¬ 
dictor  intervals  (EPI's)  into  which  predictor  data  should 
be  divided.  During  the  course  of  development,  two  important 
realizations  became  evident,  namely,  (a)  there  is  a  tendency 
for  the  methodology  to  give  better  results  using  a  small 
number  of  intervals,  and  (b)  the  NPS  W.R.  Church  Computer 
Center  limits  internal  computer  storage  space  to  two  mega¬ 
bytes  for  routine  programs.  The  first  suggested,  while  the 
second  forced,  the  research  to  be  limited  to  EPI's  of  less 
than  or  equal  to  ten  if  more  than  three  or  four  predictors 
were  to  be  considered.  Once  this  was  established,  a  proce¬ 
dure  was  developed  to  look  at  all  EPI's  within  the  stated 
limit. 

The  procedure  involves  computing  the  initial  statis¬ 
tics  (a^,  a^,  CE  and  PP)  for  each  predictor,  for  each  strategy 
(maximum-probability  and  natural-regression)  and  for  EPI's 
of  two  through  ten.  Then,  the  best  first  predictor  for  each 
number  of  EPI's  is  determined,  for  each  strategy,  by  meeting 
one  or  both  of  the  following  conditions,  when  considered  in 
the  indicated  order: 


•  „  • . 


a.  lowest  CE 

b.  highest  PP 

Once  the  best  predictor  for  each  number  of  EPI's  is 
known,  it  is  then  necessary  to  determine  the  optimum  number 
of  EPI's.  This  is  accomplished  by  computing  threat  and  skill 
scores  (Appendix  E)  for  both  the  dependent  and  independent 
data  and  choosing,  as  the  optimum  number  of  EPI's,  that  which 
gives  both  a  relatively  high  adjusted  ag  (AAO)  for  the  depen¬ 
dent  data  and  a  relatively  high  adjusted  threat  score  for 
visibility  category  I  (ATSl)  for  the  independent  data.  This 
becomes  a  somewhat  subjective  endeavor  and  remains  as  the 
only  imprecise  step  in  the  methodology. 

The  statistic  ATSl  is  used  on  the  independent  data, 
instead  of  aQ ,  because  it  is  the  poor  visibility  categories 
(I  and  II)  that  are  of  primary  forecast  interest  and  their 
forecastability  is  manifested  in  their  threat  scores.  It 
will  be  shown  that,  in  general,  the  adjusted  threat  score 
for  visibility  category  II  (ATS 2)  and  for  combined  visibility 
categories  I  and  II  (ATS 12)  are  small  compared  to  ATSl,  or 
negative,  and  that  ATS 12  is  maximized  when  ATSl  is  maximized. 
Additionally,  it  will  be  shown  that  maximum  aQ  does  not 
necessarily  coincide  with  maximum  ATSl  in  the  independent 
data.  Hence,  if  ag  was  used,  the  optimum  combination  of 
predictors  necessary  to  forecast  the  poor  visibility  cate¬ 
gories  would  not  be  included. 

Once  the  number  of  EPI's  is  established,  it  is  fixed 
for  all  subsequent  predictors  considered  for  the  developmental 
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model.  Holding  the  number  of  intervals  fixed  is  not  an 
absolute  necessity,  however  it  allows  for  a  much  more  rapid 
development  of  the  model.  Once  this  number  is  determined  for 
the  first  predictor,  it  is  used  to  calculate  FD  for  the  next 
predictor  because  FD  is  calculated  using  the  established 
number  of  EPI's.  The  next  stage  statistics  (ag,  a^,  CE  and 
PP)  are  also  computed  with  each  predictor  divided  into  this 
same  number  of  EPI’s. 

2 .  Choosing  the  Second  Predictor 

The  second  predictor  to  be  included  in  the  model  is 
determined  from  its  FD  on  the  first  predictor  and  from  the 
increase  in  aQ  resulting  from  its  inclusion.  This  is  accom¬ 
plished  by  computing  a^  with  two  predictors,  namely,  the 
first  predictor,  as  determined  above,  with  each  of  the 
remaining  predictors .  Those  predictors  which  do  not  increase 
ag  above  its  value  as  determined  with  the  first  predictor 
alone,  are  removed  from  further  consideration  for  inclusion 
into  the  set  of  predictors  in  the  developmental  model.  FD 
for  each  of  the  remaining  predictors  vs .  the  first  predictor 
is  computed.  The  remaining  predictor  with  the  lowest  FD, 
on  the  first  predictor,  is  chosen  as  the  second  predictor  in 
the  model. 

3.  Choosing  Subsequent  Predictors 

Subsequent  predictor  determination  is  similar  to  the 
second  predictor  determination.  Compute  a g  with  N  predictors 
(N  =  1,...,M+1;  M  =  the  number  of  predictors  already  in  the 
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developmental  model) ,  that  is,  the  first  through  Mth  pre¬ 
dictors,  as  previously  determined,  and  each  of  the  remaining 
predictors.  Those  predictors  which  do  not  increase  aQ  above 
its  value  as  determined  with  M  predictors  are  removed  from 
further  consideration.  RSS  FD  is  computed  for  each  of  the 
remaining  predictors  and  the  one  with  the  lowest  RSS  FD  is 
chosen  as  the  Nth  predictor  in  the  model. 

4 .  Significance  Tests 

After  each  stage  (i.e.,  after  each  new  predictor  to 
be  included  in  the  developmental  model  is  determined)  it  is 
necessary  to  determine  if  the  results  are  significant.  This 
is  accomplished  by  Monte-Carlo  means  using  the  data  set 
marginal  probabilities  of  the  predictors  and  assuming  equal 
probability  of  occurrence  for  visibility  categories  (Appen¬ 
dix  A)  .  The  statistics  aQ  and  a-^  are  computed  .for  each  of 
100  randomly  generated  data  sets  of  a  size  equal  to  the 
number  of  observations  in  the  dependent  data  set  being  tested, 
and  sorted  from  lowest  to  highest.  The  96th  value  of  a^ 
(aQ(96))  and  the  fifth  value  of  a^  (a^(05))  are  retained  as 
the  upper  and  lower  5%  critical  values.  For  developmental 
model  results  to  be  significantly  better  than  chance,  aQ 
must  be  greater  than  or  equal  to  ag(96)  and  a^  must  be  less 
than  or  equal  to  a^(05) . 

5 .  Terminating  the  Selection  of  Predictors 

Model  development  continues  until  any  one  of  four 
conditions  are  met: 
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a.  no  more  predictors  remain  to  be  considered. 

b.  results  are  no  longer  significant. 

c.  required  computer  region  size  exceeds  that  which  is 
allowed  (two  megabytes  at  the  NPS  W.R.  Church  Computer 
Center)  . 

d.  independent  data  ATSl  does  not  increase  for  two 
consecutive  predictor  additions.  (It  will  be  shown 
that  there  is  a  point  in  the  development  of  the  model 
where  the  skill  and  threat  scores  for  the  dependent 
data  diverge  sharply  from  those  for  the  independent 
data.  This  condition  for  terminating  model  development 
is  a  subjective  attempt  at  taking  this  point  into 
consideration. ) 

Once  the  model  development  is  complete,  contingency 
tables  of  forecast  visibility  categories  vs.  observed  visi¬ 
bility  categories,  for  both  the  dependent  and  independent 
data,  are  constructed.  From  the  contingency  tables,  threat 

and  skill  scores  for  both  data  sets  are  computed  and  compared. 

» 

B .  COMPARISON  METHODOLOGY 

The  results  obtained  from  the  Preisendorfer  (1983  a,b,c) 
methodology  were  compared  to  two  variations  of  a  linear, 
least-squares  regression  model.  The  model  chosen  for  the 
comparison  is  that  available  in  the  BMDP  Statistical  Software 
(namely  BMDP 2 R)  [University  of  California,  1981]  using  two 
new  threshold  schemes  developed  by  Lowe  (1984c)  (Appendix  B) . 
The  equations  developed  by  BMDP 2 R  include  all  predictors  which 
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increased  R-squared  (the  proportion  of  the  predictand  vari¬ 
ance  explained  by  the  estimation  of  the  predictand  from  the 
multiple  regression  equation)  by  at  least  1%.  An  excellent 
description  of  this  procedure  is  given  by  Best  and  Pryor 
(1983) ,  with  R-squared  being  equivalent  to  their  R-value. 

1.  Method  1 

The  first  linear  regression  method  consists  of 
generating  a  single  equation,  trained  on  the  dependent  data, 
with  the  predictand  set  equal  to  1,  2  or  3,  corresponding  to 
visibility  categories  I,  II  and  III,  respectively.  This 
equation  is  used  to  determine  threshold  values  (Appendix  B) 
and  is  then  applied  to  the  independent  data. 

2 .  Method  2 

The  second  linear  regression  method  is  based  on  a 
decision-tree  scheme  using  two  linear-regression  equations 
trained  on  the  dependent  data.  The  first  equation  is 
generated  with  the  predictand  values  set  equal  to  zero  or 
one,  corresponding  to  combined  visibility  categories  I  and 
II  (0)  and  visibility  category  III  (1) .  The  second  equation 
is  generated  with  the  predictand  set  equal  to  zero  or  one, 
corresponding  to  visibility  category  I  (0)  and  visibility 
category  II  (1) .  Visibility  category  III  observations  are 
ignored  during  this  linear  regression.  Threshold  values  are 
then  computed  for  each  equation. 

When  both  equations  and  their  associated  threshold 
values  are  known,  the  independent  data  set  is  sorted  into 
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visibility  category  III  and  visibility  category  'other'  by 
the  first  equation,  and  the  'other'  category  is  sorted  into 
visibility  categories  I  and  II  by  the  second  equation. 
Following  the  development  of  linear  regression  method  1  and 
method  2,  contingency  tables  are  constructed,  skill  and 
threat  scores  computed,  and  comparisons  made  with  the  results 
from  the  Preisendorfer  (1983  a,b,c)  methodology. 
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VI .  RESULTS 


A.  NORTH  PACIFIC  OCEAN 

1.  First-Predictor  Selection  and  Interval  Determination 
The  first  predictor  selected,  for  equally  populous 

intervals  (EPI’s)  of  four  through  ten  was  EHF  (Table  III). 

The  constant  value  for  a^,  maximum-probability  strategy, 
indicates  that  there  is  no  predictability  for  visibility 
category  II  (the  least  frequent  category  in  the  data  set) 
using  a  single  predictor.  A  comparison  of  the  dependent 
data  adjusted  a^  ( AAO )  and  independent  data  adjusted  threat 
score  for  visibility  category  I  (ATS1)  subjectively  deter¬ 
mined  the  selection  of  five  EPI's  for  the  developmental 
model  (Table  IV;  Fig.  5) . 

2 .  Selecting  Subsequent  Predictors 

Once  the  number  of  intervals  and  first  predictor 
were  known,  a  new  a^  computation  was  made  with  the  first 
predictor  and  each  of  the  remaining  predictors.  Only  six  of 
the  remaining  21  predictors,  CLIMO,  SEHF ,  THF ,  DDWW,  H510 
and  RH,  in  combination  with  EHF,  gave  new  aQ  values  greater 
than  that  for  EHF  alone  (.697);  these  comprised  the  pool  of 
predictors  to  be  considered  for  further  development  of  the 
model.  Functional  dependence  (FD)  with  EHF  was  computed  for 
each  of  these  six  predictors  and  DDWW  was  chosen  as  the  second 
predictor  because  it  had  the  lowest  FD. 
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For  the  determination  of  the  third  through  sixth 


predictors,  a  new  a^  was  computed  as  a  function  of  all  of 
the  previously  selected  predictors  and  each  of  the  remaining 
predictors.  At  each  stage,  the  new  a^  computation  for  each 
remaining  predictor  was  greater  than  that  for  the  prior 
stage,  so  no  further  predictors  were  eliminated  from  con¬ 
sideration.  FD  was  then  computed,  for  each  of  the  predictors 
being  considered  with  each  of  the  predictors  previously 
selected,  and  RSS  FD  determined.  At  any  given  stage  (three 
through  six)  the  new  predictor  added  to  the  developmental 
model  was  that  one  with  the  lowest  RSS  FD.  The  third  through 
sixth  predictors,  in  order  of  selection,  are  H510,  RH,  THF 
and  CLIMO  (Table  V) . 

3 .  Determining  the  Final  Model 

The  final  model  for  the  Preisendorfer  (1983  a,b,c) 
methodology  was  determined  by  comparing  the  independent  data 
contingency  table  statistics,  from  each  developmental  stage, 
and  choosing  the  fourth  stage  because  it  gave  the  highest 
adjusted  threat  score  for  visibility  category  I  (ATS1), 

(Fig.  6) .  The  contingency  tables  for  stage  four  and  the 
related  statistics  for  the  three  strategies  are  shown  in  Table 
VI. 

4 .  Linear  Regression 

A  single  linear-regression  equation  was  developed 
from  the  North  Pacific  Ocean  data  using  method  1.  Both  the 
quadratic  and  equal-variance  threshold  models  (Appendix  B) 


were  applied  but  only  the  threshold  values  from  the  equal- 
variance  model  were  used  to  compare  methodologies.  Table 
VII  contains  the  linear  regression  equation,  the  visibility 
category  linear  regression  statistics  and  the  threshold 
values.  Contingency  tables  and  related  statistics  for  the 
dependent  and  independent  data  are  shown  in  Table  VIII. 

5.  Discussion 

The  best  results  obtained  from  the  North  Pacific 
Ocean  data  were  from  the  Preisendorfer  (1983  a,b,c)  methodology, 
MAXPR0B2  strategy,  as  it  has  the  highest  independent  data 
adjusted  threat  scores  for  visibility  categories  I  and  com¬ 
bined  I/II  (ATSl  =  .20,  ATS 12  =  -.05).  Each  of  the  maximum- 
probability  strategies  (MAXPROBl:  ATSl  =  .17,  ATS 12  =  -.10) 
did  better  than  linear  regression  (ATSl  =  '.16,  ATS  12  =  -.13), 
while  natural- regress ion  shows  the  poorest  skill  (ATSl  =  -.02, 
ATS 12  =  -. 19) . 

It  appears,  from  Fig.  6,  that  most  of  the  usable 
forecastability  resides  in  the  first  predictor  chosen.  This 
would  indicate  that  it  may  be  profitable  to  search  for 
better  predictors  by  combining  model  output  parameters, 
conducting  dimensional  analysis  or  using  linear-regression 
equation  estimates  as  predictors  as  was  done  in  the  North 
Atlantic  Ocean  experiments  which  follow. 

B.  NORTH  ATLANTIC  OCEAN  AREA  3W 

Based  upon  the  results  obtained  in  the  North  Pacific 
Ocean,  it  was  decided  to  use  the  linear  regression  model  to 
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generate  equations  which  could  be  used  as  predictors.  Seven 
such  equations  were  developed,  each  representing  a  different 
menu  of  parameters  available  to  the  regression  model.  The 
seven  equations  are  included  in  Appendix  D.  The  Preisen- 
dorfer  (1983  a,b,c)  methodology  then  proceeded  both  with 
and  without  these  linear-regression  equations  available  as 
predictors . 

1 .  First  Predictor  Selection  and  Interval  Determination 

a.  Without  Linear-Regression  Equations  as  Predictors 
The  first  predictor,  for  EPI's  of  four  through 

ten,  varied  with  the  number  of  intervals  (Table  IX) .  A 
comparison  of  the  dependent  data  AAO  and  the  independent 
data  ATS1  determined  the  selection  of  eight  EPI's  for  the 
model  (Table  X)  and,  therefore,  SMF  as  the  first  predictor. 
However,  through  investigator  error,  the  model  was  initially 
developed  with  five  EPI's  and  E925  as  the  first  predictor. 
Therefore,  both  results  will  be  presented. 

b.  With  Linear-Regression  Equations  as  Predictors 
The  first  predictor  for  each  EPI  of  four  through 

ten  is  BMl,  the  predictand  estimate  computed  by  the  linear 
regression  equation  developed  when  all  of  the  predictors 
were  available  to  the  regression  model  (Table  XI) .  Two  of 
the  EPI's,  namely  four  and  eight,  have  identical,  and  best, 
dependent  data  AAO  and  independent  data  ATS1  scores  (Table 
XII,  Fig.  7) ,  so  it  was  decided  to  proceed  with  the  develop¬ 
mental  model  for  both  intervals. 


2.  Selecting  Subsequent  Predictors 


Subsequent  predictors  were  chosen  in  the  same  way  as 
described  in  the  procedures  and  for  the  North  Pacific  Ocean 
experiment.  The  predictors,  not  including  linear  regression 
equations  as  predictors,  are  SMF ,  D850,  RH,  UBLW  and  ENTRN 
for  eight  EPI's  (Table  XIII)  and  E925,  U700,  DVDP,  STRTFQ, 
ENTRN  and  PS  for  five  EPI's  (Table  XIV) .  The  predictors, 
including  linear  regression  equations  as  predictors,  are 
BMl,  U850,  D500,  V850,  D1000  and  U1000  for  four  intervals 
(Table  XV)  and  BMl,  U500,  ENTRN,  DVDP  and  BM4  for  eight 
intervals  (Table  XVI) .  Significance  tests  were  made  after 
each  predictor  selection  and  aQ(96)  and  a^(05)  values  are 
included  in  Tables  XIII,  XV  and  XVI.  A  comparison  of  the 
behavior  of  critical  level  statistics,  as  predictors  are 
added,  for  both  four  and  eight  intervals,  is  shown  in  Figs. 

8  and  9,  where  array  size  is  equal  to  the  number  of  EPI's 
taken  to  a  power  equal  to  the  number  of  predictors  included 
at  that  stage. 

3 .  Determining  the  Final  Model 

The  final  model  for  the  Preisendorfer  (1983  a,b,c) 

methodology  was  determined  by  comparing  the  independent  data 

contingency  table  statistics,  from  each  developmental  stage, 

and  choosing  that  stage  which  gave  the  highest  adjusted 

threat  score  for  visibility  category  I  (ATS1) . 

a.  Without  Linear  Regression  Equations  as 
Predictors  (Eight  Intervals) 

It  was  determined,  from  Fig.  10,  that  the  fifth 
stage  gave  the  best  results  (MAXPROB1,  independent  data: 

42 


ATS1  =  .19,  ATS2  =  .03,  ATS12  =  -.05).  The  contingency  tables 
for  stage  five  and  related  statistics  for  the  three  strategies 
are  shown  in  Table  XVII. 

b.  Without  Linear  Regression  Equations  as 
Predictors  (Five  Intervals) 

It  was  determined,  from  Fig.  11,  that  the  fifth 
stage  gave  the  best  results  (MAXPR0B2,  independent  data: 

ATSl  =  .25,  ATS2  =  .02,  ATS12  =  .01) .  The  contingency  tables 
for  stage  five  and  related  statistics  for  the  three  strategies 
are  shown  in  Table  XVIII. 

c.  With  Linear  Regression  Equations  as 
Predictors  (Four  Intervals) 

It  was  determined,  from  Fig.  12,  that  the  fourth 
stage  gave  the  best  results  ( MAXPROB 2 ,  independent  data: 

ATSl  =  .40,  ATS 2  =  -.05,  ATS 12  =  .12).  The  contingency  tables 
for  stage  four  and  related  statistics  for  the  chrse  strategies 
are  shown  in  Table  XIX. 

d.  With  Linear  Regression  Equations  as 
Predictors  (Eight  intervals) 

It  was  determined,  from  Fig.  13,  that  the  second 
stage  gave  the  best  results  (MAXPR0B2,  independent  data: 

ATSl  =  .32,  ATS 2  =  -.14,  ATS 12  =  .02).  The  contingency  tables 
for  stage  two  and  related  statistics  for  the  three  strategies 
are  shown  in  Table  XX. 

4 .  Linear  Regression 

Both  linear  regression  methods  (single  equation  and 
decision  tree)  and  both  threshold  models  (quadratic  and 
equal  variance)  [Lowe,  1984a]  were  used  to  compare  with  the 


Preisendorfer  (1983  a,b,c)  methodology  in  the  North  Atlantic 
Ocean  Area  3W.  Additionally,  the  predictors  available  for 
regression  were  varied  as  indicated  in  the  following  descrip¬ 
tion.  The  first  regression  was  conducted  with  all  available 
MOP's  while  the  second  regression  was  conducted  using  only 
the  best  predictors  from  the  Preisendorfer  methodology  (de¬ 
fined  as  those  predictors  which,  alone,  produced  an  a ^  value 
greater  than  the  frequency  of  visibility  category  III  in  the 
dependent  data) .  Table  XXI  contains  the  linear-regression 
equations,  associated  visibility  category  statistics  and 
threshold  values.  Tables  XXII  through  XXVII  contain  the 
contingency  tables  and  related  statistics  for  the  dependent 
and  independent  data  for  each  of  the  linear  regression 
variations . 

5.  Discussion 

Table  XXVIII  summarizes  each  of  the  methodologies  and 
strategies  applied  to  the  North  Atlantic  Ocean  Area  3W 
data.  In  general,  the  maximum-probability  strategy  did 
better  than  the  other  methods  or  strategies.  Specifically, 
the  best  results  overall  were  obtained  by  the  MAXPROB2 
strategy,  using  predictors  computed  from  linear  regression 
equations  and  four  equally  populous  intervals.  The  methodology 
without  linear  regression  equations  as  predictors,  and  all 
of  the  linear  regression  results,  are  about  equivalent.  The 
best  linear  regression  method  is  the  decision  tree,  when  all 
MOP's  are  made  available  to  the  regression  model.  The  results 
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obtained  without  linear  regression  equations  as  predictors 
appear  to  discount  the  procedure  established  for  choosing  the 
number  of  equally  populous  predictor  intervals,  but  lends 
support  to  the  claim  in  Chapter  V  that  there  is  a  tendency 
for  the  Preisendorfer  (1983  a,b,c)  methodology  to  give  better 
results  using  a  small  number  of  intervals. 


VII.  CONCLUSIONS  AND  RECOMMENDATIONS 


The  primary  objective  of  this  study  was  to  determine 
if  the  Preisendorfer  (1983  a,b,c)  methodology  applied  to  the 
FNOC  NOGAPS  model  output  parameters  could  improve  upon  the 
forecasting  of  atmospheric  marine  horizontal  visibility,  in 
three  categories,  when  compared  to  the  more  traditional 
method  of  least  squares,  multiple  linear  regression.  It  was 
shown  that,  indeed,  the  proposed  methodology,  namely,  the 
maximum  probability  strategy,  was  superior  when  predictand 
estimates,  computed  from  linear  regression  equations 
themselves,  were  used  as  predictors. 

The  method  of  determining  the  number  of  equally  populous 
predictor  intervals  requires  further  investigation.  The 
results  from  the  North  Atlantic  Ocean  area  3W,  without 
linear  regression  equations  as  predictors,  showed  that  the 
proposed  method  was  not  the  best,  in  that  the  number  of  inter¬ 
vals  determined  by  the  method  was  eight  but  better  results 
were  obtained  with  five.  Additionally,  only  intervals  of 
ten  or  less  were  considered  here,  due  to  storage  limitations 
imposed  by  the  computer  center.  As  a  result,  the  optimum 
number  of  predictor  intervals  is  inconclusive. 

Predictor  determination  appears  to  be  adequate.  At  each 
stage  of  development  a  unique  predictor  was  selected.  The 
only  foreseeable  problem  is  if,  during  the  first  (initial) 
stage  of  development,  multiple  predictors  have  identical  CE 


and  PP  values,  or,  during  subsequent  stages,  multiple  pre¬ 
dictors  have  identical  a^  and  FD  values.  Should  this  occur, 
the  model  development  would  have  to  proceed,  from  that 
particular  stage,  with  each  of  the  identified  predictors. 

The  methodology  appears  to  be  sensitive,  in  two  ways,  to 
the  first  predictor  selected.  First,  there  is  an  initial 
large  value  for  the  independent  data  ATSl  and  small  incre¬ 
mental  increases  thereafter  for  each  new  predictor  added. 
Secondly,  there  is  a  large  magnitude  difference  in  the 
initial  independent  data  ATSl  values  between  the  Preisen- 
dorfer  methodology  without  linear  regression  equations  as 
predictors  (ATSl  =  .13;  .14)  and  that  with  linear  regression 
equations  as  predictors  (ATSl  =  .30),  for  the  maximum 
probability  strategy. 

The  best  strategy  is  MAXPROB2,  followed  by  MAXPROBl,  and 
then  natural-regression.  Generally,  natural-regression  does 
worse  than  linear  regression.  None  of  the  methods  did  well 
in  predicting  visibility  category  II,  which  may  indicate 
that  visibility  would  be  best  handled  as  a  two-category 
phenomenon . 

The  number  of  independent  data  observations  (1526)  in 
North  Atlantic  Ocean  Area  3W  were  sufficient  to  test  the 
methodology.  This  was  demonstrated  by  the  similar  results 
between  Area  3W,  without  linear  regression  equations  as 
predictors,  and  the  North  Pacific  Ocean  results  (3682 
observations) .  The  small  differences  in  the  contingency 
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table  statistics  for  the  independent  data  for  the  two  experi¬ 
ments  can  be  attributed  to  parameters  being  from  different 
models  and  for  different  months. 

The  following  recommendations  are  offered  for  future 
research  and  to  future  researchers: 

1.  Investigate  the  problem  of  determining  the  optimum 
number  of  equally  populous  predictor  intervals. 

Possibly,  a  statistic  similar  to  the  threat  scores 
or  adjusted  threat  scores  could  be  used,  or,  simply 
choose  the  interval,  between  two  and  ten,  which  gives 
the  highest  adjusted  threat  scores  for  the  independent 
data.  Alternatively,  adopt,  without  further  experimen¬ 
tation,  the  number  of  EPI's  as  five,  which  appears  to 
be  a  compromise  between  a  gross  resolution  of  the 
predictor  parameter  range  and  a  fine  (but  too  expensive) 
resolution  of  the  predictor  parameter  range. 

2.  Investigate  the  use  of  potential  predictability  (PP) 
in  determining  the  selection  of  predictors.  During 
the  initial  stage  of  development,  PP  is  computed  for 
all  available  predictors  and  provides  a  measure  of 
each  predictor's  individual  ability  to  forecast 
visibility,  but,  it  is  not  used  explicitly.  Perhaps 
computing  the  mean  and  standard  deviation  of  PP, 
during  the  initial  stage,  and  removing  from  considera¬ 
tion  those  predictors  which  are  not  greater  than  a 
value  equal  to  the  mean  minus  one  standard  deviation, 
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or,  simply,  not  greater  than  the  mean.  This  would 
ensure  that  only  those  predictors  which  have  a  rela¬ 
tively  high  prospect  of  forecasting  visibility  will 
be  available  for  subsequent  selection. 

3.  Search  for  better  predictors  which  are  particularly 

suited  to  visibility  prediction.  Recommended  sources 

are:  new,  direct  and  derived,  model  output  parameters 

(including  original  model  output) ;  non-dimensional 

parameters  derived  from  dimensional  analysis;  and 

boundary-layer  parameters  such  as  the  optical  structure 
2 

function  (C.J  and  extinction  coefficients. 

N 

4.  Investigate  a  two-category  visibility  scheme. 

5.  Install  automatic  visibility  recorders  on  ocean-going 
military  and  civilian  passenger/cargo  ships.  This 
will  place  visibility  observations  on  a  more  objective 
basis  and  lead  to  improved  methods  of  forecasting 
visibility,  as  well  as  verifying  such  forecasts. 

6.  Investigate  new  prediction  models,  preferably  those 
which  attempt  to  manipulate  the  observed  data  to 
correct  for  probable  observer  bias  (following  Selsor, 
1980;  Renard  and  Thompson,  1984) .  This  would  be 
unnecessary  if  recommendation  5  was  acted  upon. 

7.  Investigate  other  ocean  areas  and  seasons  to  determine 
if  the  physically  homogeneous  area  scheme  is  consistent 
and  viable.  Develop  prediction  tables  and  other  aids 
specifically  tailored  to  region  and  season. 

49 


'.‘iViVV-V-V  '. 


8.  Use  a  statistic  other  than  ATSl  for  choosing  the 
first  predictor  and  for  comparing  methods  and  strate¬ 
gies.  It  was  used  in  this  study  largely  because  of 
its  greater  magnitude,  as  compared  to  ATS 2  and  ATS 12 . 
This  was  due  to  the  relatively  high  frequency  of  visi¬ 
bility  category  I  in  both  data  sets.  In  general,  this 
will  not  be  the  case.  Because  three  visibility  cate¬ 
gories  are  being  considered,  and  good  forecasts  of 
the  two  poorest  visibility  categories  is  desirable,  a 
statistic  such  as  ATS 12  would  be  better  suited  as  a 
consistent  comparison  statistic  for  future  researchers. 

9.  As  soon  as  it  is  feasible,  eliminate  from  further 
testing  the  MAXPR0B1  strategy  in  order  to  allow  for 
more  efficient  and  faster  program  execution.  The 
natural-regression  strategy,  though  it  gave  the  poorest 
results  in  this  study,  should  be  re-examined  when 
predictands  with  relatively  many  discrete  states 
(e.g.,  ceiling)  are  considered.  It  has,  in  such 
settings,  potential  to  out  perform  the  more  rigid 
linear  regression  technique. 
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PREISENDORFER  (1,9  83  a,b,c)  FOR  THE  FORECASTING  OF 
ATMOSPHERIC  MARINE  HORIZONTAL  VISIBILITY  USING 
MODEL  OUTPUT  STATISTICS 

I.  INTRODUCTION 

The  following  discussion  is  based  upon  three  unpublished 
research  papers  by  Preisendorfer  (1983  a,b,c) .  His  proposed 
methodology  deals  with  a  simple  statistical  manipulation  of 
model  output  parameters  (predictors)  which  have  been  trans¬ 
formed  from  continuous  to  discrete  quantities  by  grouping 
each  predictor  into  equally  populous  intervals .  The  proce¬ 
dural  approach  in  applying  his  methodology  to  model  output 
statistics  (MOS)  forecasting,  i-s  as  follows: 

1.  Generate  predictand/ predictor  pairs  of  data  using  the 
United  States  Navy  Fleet  Numerical  Oceanography  Center 
Navy  Operational  Global  Atmospheric  Prediction  System 
(NOGAPS)  model  output  (predictors)  and  synoptic  ship 
visibility  observations  (predictand)  provided  by  the 
Naval  Oceanography  Command  Detachment,  Asheville,  NC, 
and  generate  bivariate  plots. 

2.  Generate  conditional  probability  tables  based  on  the 
distribution  of  the  predictand/predictor  pairs. 

3.  Define  prediction  strategies  based  on  the  conditional 
probabilities . 


4.  Compute  the  potential  predictability  of  visibility 
from  the  conditional  probability  tables. 

5.  Compute  skill  scores  of  the  prediction  strategies  and 
choose  the  first  predictor. 

6.  Repeat  steps  1,  2,  4,  and  5,  for  multiple  predictors. 

7.  Compute  functional  dependence  of  selected  vs.  potential 
subsequent  predictors. 

8.  Choose  the  next  predictor. 

9.  Repeat  steps  1,  2,  4,  5,  7,  and  8,  until  model 
development  is  terminated. 

For  demonstration  purposes,  an  artificial  data  set  of 
99  cases,  consisting  of  four  predictors  plus  visibility 
(predictand) ,  will  be  used  throughout  this  discussion. 

Each  predictor  parameter  is  divided  into  three  equally  popu¬ 
lous  intervals  and  visibility  is  divided  into  three  categories, 
as  illustrated  in  Table  A1 .  The  four  predictors  are 
Evaporative  Heat  Flux  (EHF) ,  Fog  Probability  Parameter 
(FTER) ,  Relative  Humidity  (RH)  and  Air-Sea  Temperature 
Difference  (ASTD) .  Visibility  categories  are  defined  by  the 
marine  visibility  observation  codes  (MVOC)  included  in  the 
categories. 
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TABLE  A1 


ARTIFICIAL  DATA  SET 


Interval  1 


Interval  2 


Interval  3 


EHF  <2.65 
FTER  £  .024 
RH  <_  85.9 
ASTD  <  1.02 


2.65  <  EHF  <  4.44 
.024  <  FTER  £  .9 
85.9  <  RH  <_  90.0 
1.02  <  ASTD  <  1.91 


EHF  >  4.44 
FTER  >  . 9 
RH  >  90.0 
ASTD  >1.91 


Visibility  Category  I: 
Visibility  Category  II: 
Visibility  Category  III: 


MVOC  90  ->  94  (60  cases) 
MVOC  95  &  96  (20  cases) 
MVOC  97  ->  99  (19  cases) 


II.  SINGLE  PREDICTOR  STATISTICS 

A.  BIVARIATE  PAIRS 

Choose  various  visibility-predictor  pairs  and  make 
bivariate  plots  of  these  pairs.  This  will  provide  immediate 
visual  estimation  of  the  potential  predictability.  As  an 
example,  let  us  suppose  that  predictor  EHF  of  our  artificial 
data  set  has  33  cases  in  each  equally  populous  interval  and 
that  the  visibility  categories  I,  II  and  III  are  respectively 
represented  by  17,  7  and  9  in  interval  1;  1,  7  and  25  in 
interval  2;  1,  6  and  26  in  interval  3.  To  make  the  bivariate 
plot,  simply  make  a  tabular  summary  of  this  information,  as 
illustrated  in  Fig.  14.  Now  we  define,  from  the  bivariate 
plot,  our  coordinate  system  and  nomenclature.  Items  in 
parentheses  are  examples  from  Fig.  14,  numbers  in  brackets 
are  equation  numbers  from  Preisendorfer  (1983  a,b,c)  with 
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a  letter  designator  indicating  the  paper  from  which  it  was 
obtained. 


n  =  number  of  visibility  categories  (n  =  3) 

m  =  number  of  equally  populous  predictor  intervals 
(m  =3) 

j  =  the  vertical  counting  index  (j  =  l,...,n) 

i  =  the  horizontal  counting  index  (i  =  l,...,m) 

n(i,j)  =  individual  cell  counts  (n(l,3)  =  9) 

m 

n(.,j)  =  marginal  predictand  totals  =  £  n(i,j)  = 

i=l 

row  totals  (n(.,2)  =  20)  [3.1a] 

n 

n(i,.)  =  marginal  predictor  totals  =  [  n(i,j)  = 

j=l 

column  totals  (n(2,.)  =  33)  [3.2a] 

n  ( . , . )  =  total  predictand/predictor  pairs  = 

n  m 

I  l  n(i,j)  =  sum  over  all  cells  (n(.,.)  =99) 
j=l  i=l 
[3.3a] 


B.  CONDITIONAL  PROBABILITIES 

From  the  bivariate  pairs  determine  the  conditional  proba¬ 
bility  of  visibility  given  a  predictor.  We  will  continue  from 
the  bivariate  plot  in  Fig.  14,  and  define  three  probabilities: 


P12^ifj)  =  n(i, j)/n( . , .)  =  joint  probability  of  a 

predictand-predictor  pair  occurring  in  a 
given  cell  =  individual  cell  count 
divided  by  the  total  number  of  cases 
(p12(3,3)  =  26/99  =  .2626)  [3.5a] 


54 


P1 ( i )  =  n (i, . ) /n ( . , . )  =  marginal  probability  of 
predictor  =  column  total  divided  by  the 
total  number  of  cases  =  the  column  Siam  of 
the  joint  probabilities 
(p1(2)  =  33/99  =  .333)  [3.6a] 

P2 ( j )  =  n ( . , j) /n ( . , . )  =  marginal  probability  of 
predictand  =  row  total  divided  by  the 
total  number  of  cases  =  the  row  sum  of  the 
joint  probabilities  ( p0 ( 2 )  =  20/99  =  .202) 

[3.7a]  * 


We  can  now  build  a  joint/marginal  probability  table  as 
illustrated  in  Fig.  15,  and  define  conditional  probability. 


P21(j|i)  =  pl2  (i,  j)/px  (i)  =  n(i,  j)/n(i,  .)  = 

conditional  probability  of  predictand  given 
a  predictor  =  a  cell's  joint  probability 
divided  by  the  marginal  probability  of 
predictor  =  individual  cell  count  divided 
by  column  total 

(p  (2|2)  =  .071/. 333  =  7/33  =  .212) 

[3.8a] 

Now  build  a  conditional  probability  table  as  illustrated 
in  Fig.  16.  Conditional  probability  of  visibility,  given 
some  predictor,  is  the  quantity  of  greatest  interest  in  this 
study.  Note  that  if  P2^(j|i)  =  1/n  for  j  =  l,...,n  at 

some  i  (i.e.,  each  cell  contains  1/n  of  the  cases  in  its 
column) ,  then  very  little  information  is  available  to  predict 
visibility  at  that  i.  However,  if  P2^(jQ|i)  =  1  for  some 

jg  and  p2^(j|i)  =  0  for  all  other  j  values,  then  there  is 

perfect  predictability  of  class  jg  by  the  predictor  at  class 
i.  The  underlying  methodology  of  this  study  will  be  to 
determine  the  maximum  conditional  probability  of  visibility 
for  each  predictor  value. 


C.  STRATEGIES 


Preisendorfer  (1983  a,b,c)  presents  three  different 
prediction  strategies,  two  based  on  maximum  probabilities 
(MAXPR0B1  and  MAXPR0B2)  and  one  based  on  natural  regression. 

1 .  Maximum  Probability 

This  strategy  consists  of  determining  the  cell,  in  a 
given  column,  with  the  highest  conditional  probability,  and 
assign  to  the  column  the  visibility  category  associated  with 
that  cell.  As  each  column  represents  an  interval  of  predic¬ 
tor  values,  we  now  have  a  visibility  forecast  value  associated 
with  that  interval.  In  our  example  with  EHF  (Fig.  16), 
interval-  1  (i  =  1)  will  have  a  forecast  value  of  visibility 
category  I  (VISCAT  1) .  Hence,  if  we  used  only  EHF  as  a 
predictor,  every  time  a  value  of  EHF  was  encountered  with  a 
value  <^2.65,  we  would  predict  visibility  category  I.  Simi¬ 
larly,  for  interval  2  (i  =  2)  and  for  interval  3  (i  =  3) 
we  would  choose  visibility  category  III  (VISCAT  3) . 

MAXPR0B1  and  MAXPR0B2  differ  only  in  the  way  they 
handle  a  tie  between  maximal  conditional  probabilities  in 
a  column.  Should  this  occur,  then  a  decision  must  be  made 
as  to  which  predictand  category  will  be  assigned  to  that 
predictor  interval.  In  MAXPR0B1,  this  decision  is  made  by 
a  coin  toss,  figuratively.  A  random  number,  in  the  unit 
interval,  is  generated.  The  unit  interval  is  divided  into  a 
number  of  subintervals  equal  to  the  number  of  tied  values 
and  each  subinterval  is  assigned  to  a  specific  predictand 
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category.  The  subinterval  into  which  the  random  number 
falls  determines  the  forecast  visibility  category.  In 
MAXPR0B2,  the  lowest  predictand  category,  among  the  tied 
categories,  is  chosen. 

2 .  Natural  Regression 

This  strategy  consists  of  first  finding  the  average 
predictand  (visibility  category)  for  each  predictor  interval, 
using  conditional  probabilities,  and  then  choosing  the 
predictand  category  nearest  the  average. 

T(i)  =  l  j  P-m  (j  |i)  [7.1b] 

j=l  21 

Fig.  17  shows  the  computation  for  EHF  interval  1  (i  =  1) . 
Visibility  category  II  (VISCAT  2)  would  be  assigned  to  this 
interval  by  this  strategy. 

D.  COMPARISON  STATISTICS 

To  determine  if  a  predictor  will  be  useful  in  forecasting 
there  should  be  a  statistic  with  which  to  compare  its  poten¬ 
tial  utility.  Preisendorfer  (1983  a,b,c)  defines  four  such 
statistics  and  their  critical  values.  The  four  statistics 
defined  are  potential  predictability  (PP) ,  class-error 
probabilities  (ag,a^),  and  functional  dependence  (FD) . 
Potential  predictability  and  class-error  probabilities  will 
be  defined  now.  Functional  dependence  will  be  addressed 
later . 
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Potential  Predictabilit 


Potential  predictability  of  a  predictand/predictor 


pair  is  defined  as: 


PP  (2  j  1)  =  n/(n-l)  l  p  (i)  [  l  (p21(j[i)  -l/n)Z] 

i=l  j=l 


l  p.  (i)  PP  (i) 
i=l  1 


where : 


PP  ( i)  =  n/(n-l)  l  (p_,(j|i)  -  1/n) Z  , 

j*l 


P1(i)  =  the  marginal  probability  of  a  predictor,  and 

p21(j|i)  =  the  conditional  probability  of  the  jth 

■  predictand,  given  the  ith  predictor.  [4.1a] 


PP (2 | 1)  is  loosely  related  to  Shannon's  definition  of  infor¬ 
mation  [Preisendorfer ,  1983a] .  An  example  calculation  is 
shown  in  Fig.  18  where  EHF  has  a  PP  value  of  .330.  To 
determine  if  this  would  be  the  best  predictor  using  this 
statistic,  compute  the  potential  predictability  for  all 
predictors  and  rank  them  from  highest  to  lowest.  The 
predictor  with  the  highest  PP  should  be  the  best  predictor 
for  forecasting  visibility  using  any  strategy. 


2 .  Class-Error  Probabilities 

Zero-class  (ag)  and  one-class  (a^)  error  probabili¬ 
ties  can  be  defined  to  gauge  the  predictive  skill  of  a 
prediction  strategy. 

m 

a0  =  pl(i)  p21(j0(i)  ^i) 

where: 

p^(i)  =  the  marginal  probability  of  the  predictor, 

jg(i)  =  the  jQth  cell  in  column  i  assigned  by 
the  prediction  strategy,  and 

p2i(jo(i)|i)  =  the  conditional  probability  of  the  jQ(i). 

*" '  [6  .  la] 

From  Figs.  15  and  16,  p-^(i)  =  .333  for  all  i;  j Q  ( 1 )  =  1, 

p21 ( j0 (1)  1 1)  =  ,515;  j0(2)  =  3'  p21(j0{2)l2)  =  -758;  and 

jQ(3)  =  3,  p21(jQ(3)|3)  =  .788.  Therefore,  if  EHF  is  the  only 
predictor, 

aQ  =  (.333) (.515)  +  (333) (.758)  +  (.333) (.788)  =  .686 

The  statistic  a^  is,  by  definition,  equal  to  the  fraction  of 
correct  forecasts  in  the  dependent  data  set. 


where: 


p91(j0(i)  ±l|i)  =  the  conditional  probabilities 

adjacent  to  the  p21(jQ(i)ji) 

values  used  in  the  afi 
determination . 

If  jg  =  1  then,  by  definition,  ”1li)  ~  0;  similarly 

if  jg  =  n  then,  by  definition,  p21(jQ(i)  + 1 | i)  =0.  [6.2a] 
The  statistic  a^  is,  by  definition,  equal  to  the  fraction  of 
forecasts  for  which  a  class  1  error  has  been  committed. 

Again,  from  Figs.  15  and  16: 

ax  =  (.333) (.212+0)  +  ( .333) ( .212+. 0)  +  (. 333) (. 182+0 ) 

=  .202 

To  determine  which  one  of  two  or  more  predictors  is 
the  most  skillful,  we  can  plot  the  (aQ,a1)  pairs  on  a  skill 
diagram  as  in  Fig.  19.  The  dashed  lines  are  lines  of  con¬ 
stant  class  error  (CE  =  a^  +  2a2)  and  the  more  skillful 
predictors  will  lie  on  the  lower  right  part  of  the  triangle. 
In  general,  the  skill  on  the  diagram  decreases  according  to 
the  zig-zag  rule  shown  in  the  figure.  If,  for  all  predic¬ 
tors,  a^  is  constant,  which  may  occur  during  the  first 
predictor  determination  with  a  data  set  containing  relatively 
few  poor  visibility  cases,  then  the  best  predictor  is  that 
one  with  the  greatest  Sq  value.  In  this  instance  there  is 
no  need  to  plot  the  pairs. 
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MULTIPLE  PREDICTOR  STATISTICS 


Once  all  predictand/predictor  pairs  have  been  formed 
and  potential  predictability  and  skill  scores  determined, 
the  predictors  can  be  ordered  by  decreasing  predictor  skill 
and  by  potential  predictability.  Fig.  20  contains  the 
bivariate  plot,  conditional  probabilities,  potential  pre¬ 
dictability  and  skill  scores  for  the  remaining  three  predic¬ 
tors  in  our  artificial  data  set.  The  ordering  of  predictors 
is  shown  in  Table  A2.  Therefore,  EHF  would  be  chosen  as 
our  first  predictor,  as  illustrated  on  the  skill  diagram 
in  Fig.  19.  As  RH,  FTER  and  ASTD  have  equal  aQ  and  a^ 
values,  they  are  ranked  according  to  decreasing  potential 
predictability. 


.  TABLE  A2 

RANKING  OF  PREDICTORS  BY  SKILL 
AND  POTENTIAL  PREDICTABILITY 


ijo 

^1 

PP 

1st 

EHF 

.686 

.202 

.330 

2nd 

RH 

.606 

.202 

.225 

3rd 

FTER 

.606 

.202 

.211 

4th 

ASTD 

.606 

.202 

.209 

Preisendorf er  (1983b)  develops  statistics,  similar  to 
those  already  mentioned,  for  multiple  predictors.  The  main 
conceptual  difficulty  of  additional  predictors  is  the 
increase  of  dimensions.  One  predictor  presents  a  relatively 


simple  two-dimensional  problem  (predictor  1  vs.  predictand) ; 
two  predictors  present  a  three-dimensional  problem  (predictor  1 
vs.  predictor  2  vs.  predictand);  three  or  more  predictors 
present  four-dimensional  and  larger  problems.  However,  with 
a  little  manipulation,  all  of  the  multi-dimensional  problems 
greater  than  two-dimensions  can  be  reduced  to  a  two-dimensional 
problem.  This  is  illustrated  in  Figs.  21  and  22  for  three- 
dimensions  (two  predictors)  and  four-dimensions  (three  predic¬ 
tors)  .  An  easily  programmable  equation  can  be  developed  to 
create  these  two-dimensional  arrays  based  upon  the  number  of 
equally  populous  intervals  for  each  predictor  and  upon  the 
interval  in  which  a  particular  data  case  resides. 

In  our  continuing  example,  reduce  the  equally  populous 
intervals  for  each  predictor  to  an  integer  number  (i  =  l,...,m) 
with  1  corresponding  to  the  lowest  interval  and  m  correspond¬ 
ing  to  the  highest  interval,  as  defined  for  the  predictor 
index  in  Section  II. A.  Let 

ii  =  the  interval  integer  number  for  EHF, 

jj  =  the  interval  integer  number  for  RH, 

kk  =  the  interval  integer  number  for  FTER, 

mm  =  the  interval  integer  number  for  ASTD, 

11  =  the  column  location  in  the  two-dimensional 

bivariate  plot  (equivalent  to  i  for  a 
single  predictor) , 

IGPl  =  the  total  number  of  intervals  for  EHF, 

IGP2  =  the  total  number  of  intervals  for  RH, 

IGP3  =  the  total  number  of  intervals  for  FTER, 

IGP4  =  the  total  number  of  intervals  for  ASTD. 


Then,  for  one  predictor,  EFH : 

11  =  ii 

for  two  predictors,  EHF  and  RH: 

11  =  IGP2(ii-l)  +  jj 

for  three  predictors,  EHF,  RH  and  FTER: 

11  =  IGP2 (ii-1+lGPl (kk-1) )  +  jj 

for  four  predictors,  EHF,  RH,  FTER  and  ASTD: 

11  =  IGP2  (ii-l+IGPl  (kk-l+IGP3  (mm-1)  )  )  +  jj 

This  equation  form  can  be  expanded  to  accommodate  any  number 
of  predictors. 

IV.  FUNCTIONAL  DEPENDENCE 

After  the  first  predictor  has  been  selected,  either  from 
its  skill  score  or  potential  predictability,  we. need  a  means 
to  determine  whether  or  not  to  add  a  new  predictor  to  the 
one(s)  already  chosen.  For  this  purpose,  Preisendorfer 
(1983c)  proposes  a  functional  dependence  index  (FD)  which 
describes  the  dependence  of  the  new  predictor  being  considered 
upon  those  already  in  the  set  of  predictors.  If  FD  is  large 


(on  the  scale  0  to  1)  then  it  can  be  represented  by  predic¬ 
tors  already  chosen  and  its  inclusion  into  the  set  of 
predictors  would  be  redundant.  However,  if  FD  is  small  (on 
the  scale  0  to  1)  then  it  is  likely  to  be  a  useful  addition 
to  the  existing  collection  of  predictors. 


m  n 

FD  ( 2  1 1 )  =  m/2  (m-1)  £  J  p.  ,  (i,  j)|  q(i,  j) -r  (i ,  j)  |  (2.1c) 

i=l  j=l 


where : 


n-j  j-1 

q(i»j)  =  l  p91  (j+k|i+l)  +  l  p  (j-k|i-l)  (2.2c) 

k=l  k=l 


=  the  sum  of  the  conditional  probabilities 
which  lie  in  column  i+1  and  rows  greater 
than  -j  and  the  conditional  probabilities 
which  lie  in  column  i-1  and  rows  less  than  j 

=  the  sum  of  the  conditional  probabilities  to 
the  right  and  up,  and  to  the  left  and  down. 
The  upper  left  (l,n)  and  lower  right  (m,l) 
cells  will  always  have  q  values  equal  to  zero 


j-1  n-j 

r(i,j)  =  l  p51(j-k|i+l)  +  l  P91 ( j+k | i-1)  (2.3c) 

k=l  k=l 


=  the  sum  of  the  conditional  probabilities 

which  lie  in  column  i+1  and  rows  less  than  j 
and  the  conditional  probabilities  which  lie 
in  column  i-1  and  rows  greater  than  j 

=  the  sum  of  the  conditional  probabilities 

to  the  right  and  down,  and  to  the  left  and  up 
The  upper  right  (m,n)  and  lower  left  (1,1) 
cells  will  always  have  r  values  equal  to  zero 


Pl2(i#j)  and  P2^(j±k|i±l)  =  the  joint  and  conditional 

probabilities  defined  earlier,  differing 
only  in  that  the  abscissa  and  ordinate  are 
now  predictor  vs.  predictor  vice  predictor 
vs.  visibility. 

Fig.  23  illustrates  the  FD  computation  for  RH  given  EHF. 

In  this  example,  FD(2|l)  =  FD(RH|EHF)  =  .286. 

V.  CRITICAL  VALUES 

Once  the  various  statistics  have  been  found,  a  means  to 
determine  whether  they  are  significant  must  be  established. 
Preisendorfer  (1983  a,b,c)  proposes  the  use  of  Monte  Carlo 
means,  applied  as  follows. 

From  the  bivariate  plot,  as  in  Figs.  14,  21b  and  22b, 
we  determine  the  marginal  probabilities  of  the  predictor 
(p-^  ( i)  )  and  establish  incremental  values  from  0  to  1  (note 
that  for  equally  populous  predictor  intervals,  p^(i)  =  1/m, 
a  constant,  where  m  =  the  number  of  intervals) .  We  then  cast 
a  total  of  n(.,.)  randomly  generated  numbers  into  the 
intervals  to  simulate  a  new  data  set.  After  each  randomly 
generated  data  case  is  cast  into  a  column,  it  is  placed  into 
a  cell  using  uniform  probability.  Fig.  24  shows  the  incre¬ 
mental  values  associated  with  the  bivariate  plot  in  Fig.  21b. 
In  our  continuing  example  we  have  n(.,.)  =99,  so  we  would 
generate  99  random  numbers  in  the  unit  interval.  All  random 
numbers  <  .071  would  be  placed  in  column  i  =  1;  those  greater 


than  .071  and  <_  .192  would  be  placed  in  column  i  =  2;  and 
so  on.  As  each  data  case  is  placed  into  a  column,  a  single 
random  number  is  generated  to  determine  into  which  cell  the 
case  is  to  be  placed  (e.g.,  a  random  number  <_  .33  would  be 
counted  in  cell  (i,l);  a  random  number  greater  than  .33  and 
<_  .66  would  be  counted  in  cell  (i,2);  etc.).  After  all  99 
cases  have  been  cast  into  their  appropriate  cells,  all  of 
the  statistics  previously  discussed  would  be  computed  and 
saved.  This  process  would  be  repeated  100  times  so  that  we 
would  have  an  array  containing  100  randomly  generated  poten¬ 
tial  predictabilities,  ag's,  a^'s  and  FD's.  These  would  be 
sorted  from  lowest  to  highest  and  the  96th  (PP(96),  aQ(96), 
3^(96)  and  FD(96)}  value  would  determine  the  upper  5%  critical 
value  and  the  5th  (PP(05),  ag(05),  a^COS)  and  FD(05))  value 
would  determine  the  lower  5%  critical  value.  For  all  statis¬ 
tics  other  than  FD,  we  want  values  from  our  dependent  data 
set  to  be  greater  than  the  upper  5%  or  less  than  the  lower 
5%  critical  values.  For  FD  we  want  values  lower  than  the 
upper  5%  critical  value  to  ensure  that  our  second,  and  subse¬ 
quent,  predictor  is  not  significantly  dependent  on  the  previous 
predictor (s) . 

VI.  CHOOSING  PREDICTORS 

The  first  predictor  is  determined  as  shown  in  Section  III. 
That  is,  by  computing  initial  PP,  a q  and  a^  values  for  each 
predictor,  ordering  them  by  skill  score  and  PP  and  choosing 


the  one  with  the  greater  skill  score,  or  greatest  PP  in  the 
event  that  all  skill  scores  are  identical. 

Subsequent  predictors  will  be  subjected  to  two  tests; 
functional  dependence  and  skill  score.  Let 

p  =  the  number  of  predictors  already  chosen, 

ag (k-1)  and  a^(k-l)  =  the  0-  and  1-class  errors 

of  the  previous  stage  of  construction  of  the 
developmental  model, 

k  =  the  index  of  the  current  stage. 

Then,  for  the  next  (kth)  predictor  to  be  accepted  it  should 
meet  the  following  three  conditions; 


(1) 

FD  < 

FD (96 | i) 

(i  = 

l#p) 

(2) 

ao(k) 

>  a0 (k-1) 

and 

a^k)  1 

(k-1) 

(3) 

ao(k) 

1  a0(96) 

and 

a1(k)  < 

a1(05) 

condition  (1) 

is  not  met 

but 

conditions 

(2)  and 

then  a  predictor  may  still  be  used,  but  the  increase  of 
predictability  of  the  predictand  will,  on  average,  be  less 
than  if  condition(l)  had  been  met.  However,  if  conditions 
(2)  and  (3)  are  not  met,  then  the  predictor  should  not  be 
considered  further.  Repeat  this  process  at  all  stages  for 
all  remaining  predictors  until  no  further  predictors  are 
available,  then  stop  the  construction  of  the  developmental 
model . 
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TESTING  THE  DEVELOPMENTAL  MODEL  ON  INDEPENDENT  DATA 


Once  the  model  has  been  developed  and  no  further  predic¬ 
tors  remain  to  be  considered,  we  can  test  it  for  skills 
(a^a^  on  an  independent  data  set  (any  set  whose  numbers 
were  not  used  to  develop  the  model) .  This  is  easily  accom¬ 
plished  by  sorting  the  independent  data  case  values  into 
predictor  intervals,  determined  from  the  dependent  data,  and 
calculating  the  location  in  the  forecast  array  (11  in  Figs. 
21b  and  22b)  of  the  appropriate  prediction,  using  the  equa¬ 
tions  established  in  Section  III.  It  is  to  be  expected  that 
on  average  the  test  (ag,a^)  points  on  the  skill  diagram,  for 
an  independent  data  set,  will  not  be  as  skillful  as  on  the 
set  of  developmental  points. 


APPENDIX  B 


LINEAR  REGRESSION  AND  THRESHOLD  MODELS 

A.  LINEAR  REGRESSION 

In  this  study  a  least-squares,  multiple  linear  regression 
model,  known  as  BMDP2R  in  the  BMDP  Statistical  Software 
[University  of  California,  1981] ,  was  used.  The  procedure 
used  is  called  forward  step-wise  selection  and  picks  the 
predictors  (of  the  many  offered)  that  have  the  highest 
correlation  with  the  predictand  (visibility)  based  upon  F-to- 
enter  and  F-to-remove  limits,  where  F  is  a  ratio  which  tests 
the  significance  of  the  coefficients  of  the  predictors  in 
the  regression  equation. 

The  regression  model  fitted  to  the  data  is 

y  =  a  +  b.x..  +  b0x0  +  ...  +  b  x  +  e 
1122  p  p 

where; 

y  =  the  dependent  variable  (predictand)  which  can 
be  either  a  continuous  function  or  a  discrete 
value 

x. , . . . ,x  =  the  independent  variables  (predictors) 
i  P 

b^,...,b  =  the  regression  coefficients 

a  =  the  intercept 

p  =  the  number  of  independent  variables 
e  =  the  error  with  mean  zero. 
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The  predicted  value  y,  and  the  general  form  of  the  resulting 
equation,  is 

/s 

y  =  a  +  b.x.  +  b.x.  +  ...  +  b  x 
li  2  2  P  P 

The  step-wise  selection  of  predictors  continues  until  there 

are  no  predictors  remaining  which  meet  the  F-to-enter  criteria. 

The  regression  equation  generated  at  each  step  is  printed, 

along  with  its  R-value  (the  correlation  of  the  dependent 

~  2 

variable  y  with  the  predicted  value  y)  and  R  .  The  resulting 

set  of  equations,  one  for  each  step,  are  reviewed,  and  that 

equation  containing  only  those  predictors  which  increased 
2 

R  by  at  least  .01  is  retained  for  application. 

The  role  of  regression,  once  appropriate  predictor 
variables  have  been  selected,  is  simply  that  of  dimension 
reduction  (representing  a  multivariate  structure  by  a  uni¬ 
variate  proxy  which  constitutes  a  classif icatory  or  predictive 
index) .  This  proxy  takes  the  form  of  a  polynomial,  linear 
in  its  coefficients,  of  the  components  of  the  multivariate 
structure.  The  problem  now  becomes  one  of  determining  the 
form  of  the  state  conditional  distributions  (one  for  each 
group  of  interest;  e.g.,  1,  2  and  3  for  visibility  categories 
I,  II  and  III,  as  used  in  this  study) .  Once  an  appropriate 
form  has  been  selected,  it  remains,  then,  to  determine  the 
parameters  of  the  class  conditional  distributions  (e.g., 
means  and  variances)  and  then  apply  the  decision  criteria  or 
threshold  model. 
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B.  THRESHOLDS  [LOWE,  1984al 
1.  Notation 

E  5  an  event;  this  is  an  indicator  variable  which 
when  E  =  1,  the  threatening  event  occurs,  and 
when  E  =  0,  the  non-threatening  event  occurs. 

C  =  the  classification  of  an  unknown  event  which 
when  C  =  1,  the  event  is  classified  as  a 
threat,  and  when  C  =  0,  the  event  is  classified 
as  a  non-threat. 

P [E  =  1]  =  unconditional  probability  of  occurrence  of 

threat. 

P[E  =  0]  =  unconditional  probability  of  occurrence  of 

non-threat . 

Error  of  the  1st  kind  (false  alarm)  [C  =  lnE  =  0]. 

Error  of  the  2nd  kind  (miss)  [C  =  0n£  =  l). 

P[C=l(iE=0]  =  joint  probability  of  an  error  of  the  1st 

kind. 

P[C  =  0nE=l]  =  joint  probability  of  an  error  of  the 
2nd  kind. 

P [C  = 1 1 E  =  0 ]  =  class  conditional  probability  of  misclassi- 

fying  a  non-threat. 

P [C  =  0 | E  = 1 ]  =  class  conditional  probability  of  misclassi- 

fying  a  threat. 

P[C  =  1  n  E  =  0]  =  P  [C  =  1 1  E  =  0  ]  P  [E  =  0]  . 

P  [C  =  0  n  E  =  1]  =  P  [C  =  0  |  E  =  1]  P  [E  =  0]  . 

z  =  a  value  of  the  predictive  index  (equivalent 
to  y,  above) . 

Z  =  range  of  the  predictive  index  on  the  real  line. 


For  a  dichotomous  problem,  Z  is  into  two  parts  Z^,  Z^, 
C  =  0  if  z  £  ZQ 


C  =  1  if  z  g  Z1 
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The  decision  regions  are  mutually  exclusive  and  exhaustive 


(i.e.,  ZQ  n  Z2  =  0  and  Z  =  Z^  u  Z 1)  . 

Thresholds  =  boundary (s)  between  decision  regions. 

p(z|E=0)  =  class  conditional  density  of  z  given 

that  E  =  0. 


p ( z | E  = 1)  s  class  conditional  density  of  z  given 
that  E  =  1. 

A(z)  =  p(z|E=l)/p(z|E=0)  =  the  maximum  likelihood 

ratio  (i.e.,  the  ratio  of  class  conditional 
densities) . 

Pe  =  p{[C  =  lnE  =  0]  u  [C=0nE=l]}  =  the  total 

probability  of  error. 


2 .  Minimum  Probability  of  Error  Criterion 
p  =  probability  of  an  incorrect  classification. 

pe  =  p  [C  =  1  j  E  =  0  ]  p  [E  =  0  ]  +  p[C  =  0  |E  =  1]  p  [E  =  1  ] 


where  p[E=l]  +  p[E=0]  =  1.  Note  that  the  events  E  =  1 
and  E  =  0  are  mutually  exclusive  and  exhaustive.  The  objec¬ 
tive  is  to  select  decision  regions  (thresholds)  so  as  to 
minimize  p  . 

p  [C  =  0  |  E  =  1  ]  =  /  p(zjE=l)dz  =  the  probability  of 

ZcZQ 

misclassifying  E  =  1. 

p  (C  =  0  |  E  =  1]  =  /  p(z|E  =  l)dz  +  /  p  (z  |  E  =  1)  dz 

ZcZq  ZeZ-j^ 


/  p  (z  |  E  =  1)  dz 


p  [C  =  0 1 E  =  1]  =  1  -  /  p(zjE=l)dz  these  are 

zeZ.  substituted 

into  the 
expression 

p [C  =  1 1 E  =  0 ]  =  /  p(z|E  =  0)dz  for  p 

ZeZ  e 


then. 


p  =  p[E  =0]  /  p  ( z  |  E  =  0 )  dz  +  p  [E  =  1  ]  [1  -  /  p  ( z  |  E  =  1)  dz  ] 

zeZ^  zeZ^ 


and  algebraic  rearrangement  yields. 


=  p  [E  =  1  ]  -  /  {p  [E  =  0  ]  p  (z  |  E  =  0)  -  p  [E  =  1]  p(z  |E  =1)  }dz 

zeZ1 

In  order  to  minimize  pg,  Z^  (the  decision  region  for  C  *  1) 
will  include  all  those  values  of  z  for  which  the  integrand 
in  the  expression  for  pe  will  be  negative.  The  decision 
regions  can  be  symbolically  represented  as  follows: 

ZQ  =  {z:  p  (E  =0]  p  (z  |  E  =  0 }  -  p[E=lJ  p(z(E=l)  >  0} 

Z^  =  {z:  p[E=0]  p(z|E=0)  -  p[E=l]  p(zjE=l)  <  0} 

An  alternative  representation  is  given  by, 

ZQ  =  (z:  p  [E  =  0  ]  p  ( z  |  E  =  0 )  >  p  [E  =  1  ]  p(z|E*l)} 

=  {z:  p  [E  =  0  3  /p  [E  =  1  ]  >  p(z  j  E  =1)  /p(z  j  E  =  0)  } 
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Likewise, 


Z1  =  {z:  p[E  =  0]/p[E  =1]  <  p(z  |E  =  l)/p(z  |E  =  0)  } 

These  statements  can  be  combined  to  give, 

c=l 

p(z|E  =l)/p(z|E  =0)  =  A  (z)  >  p[E  =0]/p[E  =  1] 

c=0 

Thresholds  are  the  value (s)  of  z  for  which 

A  (z)  =  p[E  =  0]/p(E  =1] 

This  equation  can  be  solved  for  z  either  analytically  or 
numerically  depending  on  the  forms  of  the  density  functions. 
3.  Threshold  Cases 

In  order  to  examplify  the  model,  the  assumption  is 

made  that  the  class  conditional  distributions  are  Gaussian. 

There  are  essentially  three  distinct  cases  that  can  arise. 

a.  Case  I:  Equal  variances;  different  means 

(Referred  to  as  the  equal  variance  model  in  the 
text) 

p(z  | E  =  1)  =  k  exp{(-l/2)  (z  -y1)2/a2} 
p(z | E  =  0)  =  k  exp{ (-1/2)  (z  - yQ) 2/a2  } 


where : 


k 


(2tr) 


-1/2  -1 
'  a 
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Vf, 


■-.v 


A(z) 


exp{ (-1/2) (z  -  y1) 2/a2}  c=l  pQ 
exp{(-l/2)  (z  -yQ)2/a2}  c=0  P1 

where  p^  =  p[E=0]  and  p^  =  p[E=l],  Thus,  the  threshold 
value  is 

z*  =  (y0+u1)/2  +  a2  ln(pQ/p1)/(y1  -  yQ) 


The  position  of  the  threshold  depends  on  the  relative  values 
of  p-^  and  p0 .  The  threshold  moves  toward  the  group  with  the 
smallest  p^.  If  p^  =  p^  the  threshold  will  be  the  value  of 
z  where  the  densities  intersect  (i.e.,  where  the  densities 
are  equal) . 

b.  Case  II:  Equal  means;  different  variances 

o0exp{(-l/2)  (z  -y1)2/a2}  c=l  pQ 
a1exp{(-l/2)  (z  -yQ)2/a2}  c=0  P1 
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with  the  threshold 


Z* 


2°0°1 


In 


P0al 

(-^) 

piao 


1/2 


Note  that  in  this  situation  there  are  two  thresholds.  The 
group  having  the  smaller  variance  will  lie  between  the  two 
thresholds. 


The  thresholds  shown  are  typical  of  a  situation  where  p^  <  p( 
Note  that  these  thresholds  lie  between  the  two  intersections 
of  the  densities.  If  the  inequality  of  prior  probabilities 
were  reversed,  the  thresholds  would  lie  outside  of  the 
region  between  the  two  density  intersections.  Further  note 
that  the  decision  region  for  the  group  having  the  lesser 
variance  lies  between  the  thresholds. 


Case  III:  General  Solution  (Referred  to  as 
the  Quadratic  Model  in  the  text) 


c . 


p ( z | E  =  1)  =  k/a1  exp{  (-1/2)  (z  -  2/a2} 
p(z  l E  =  0)  =  k/cQ  exp{(-l/2)  (z  -u0)2/a2} 


A  (z) 


exp{l/2 


c=l 

> 

< 


P0°l 


c=°  Plo0 


-1/2 

where  k  =  (2-rr)  '  .  Algebraic  manipulation  produces 

,2  2.  2  .  0 ,  2  2  , 

°1  ~  a0  2  +  2  “aly0^z 

C=1 

+  [(a2y2 -°0U1)  *‘2a0°l  ln  (P0al/P1a0)  ]  < 

C=1 

which  is  recognizable  as  a  quadratic  equation  in  z. 
z*  =  -b  ±  (b2-  4ac)1/2/2a 

where : 

2  2 
a  =  c1  -  a0 

b  =  2(CqUx  -  a2yQ) 

c  =  (a2y2  -  o2y2)  -  2a?y~  ln  (pna1/p,a0) 


Classification  index  (z) 


The  remarks  given  for  the  figures  in  cases  I  and  II  are  also 
applicable  here.  More  often  than  not,  only  one  of  a  pair  of 
thresholds  induced  by  differing  variances  will  be  of  real 
interest.  If  the  variances  of  the  two  groups  are  radically 
different,  then  both  members  of  the  threshold  pair  become 
important . 

In  the  foregoing,  normal  class  conditional  dis¬ 
tributions  were  assumed.  This  was  done  because  the  Gaussian 
form  admits  of  a  rather  clean  analytical  solution.  However, 
the  general  concept  of  the  minimum  probable  error  decision 
criteria  may  be  applied  to  any  form  of  density  function. 

Indeed,  the  density  function  of  one  group  need  not  even  be 
the  same  form  as  that  for  another  group  (one  might  be  exponen¬ 
tial  and  the  other  Gaussian) .  The  difficulty  with  most  non- 
Gaussian  forms  is  that  they  seldom  admit  of  closed  analytical 
forms  and  require  numerical  means  in  determination  of  thresholds 


APPENDIX  C 


NORTHERN  HEMISPHERE  PREDICTOR  PARAMETERS  AVAILABLE 
FOR  THE  NORTH  PACIFIC  OCEAN,  JULY  1979,  EXPERIMENTS 


Area:  30°-60°N;  145°E-130°W 

Model  output  time:  0000GMT  (TAUOO) 


A.  Model  output  Descriptive  name  of  parameters 
parameters  _ 

Primitive  equation  model 


TX 

EX 

EHF 

SEHF 

THF 

H510 

GGTHTA 

FTER 


Surface  air  temperature 

Surface  vapor  pressure 

Evaporative  heat  flux 

Sensible  plus  Evaporative  heat  flux 

Total  heat  flux 

1000-500  mb  thickness  anomaly 

Surface-front  location  parameter 

Advective  fog  probability 


Mass  structure  model 


PS 

TAIR 

EAIR 

TSEA 

SSANOM 

T9  2  5 

U925 

V925 

NCLOUD 


Surface  pressure 

Surface  air  temperature 

Surface  vapor  pressure 

Sea  surface  temperature 

Sea  surface  temperature  anomaly 

925  mb  temperature 

925  mb  zonal  wind  component 

925  mb  meridional  wind  component 

Total  cloud  cover 


Marine  wind  model 


WWW 

DDWW 


Marine  surface  wind  speed 
Marine  surface  wind  direction 


Climatological  parameter 

CLIMO  National  Climatic  Center  fog 

frequency  climatology 

Derived  parameters 


ASTD 

RH 


TAIR-TSEA 

Surface  relative  humidity 
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APPENDIX  D 

NOGAPS  PREDICTOR  PARAMETERS  AVAILABLE  FOR  THE  NORTH 
ATLANTIC  OCEAN/  15  MAY-15  JULY  1983,  EXPERIMENTS 


Area:  Entire  North  Atlantic  Ocean  and  Mediterranean  Sea 


Model  output  time: 

A.  Model  output 
parameter 

1200GMT  (TAUOO) 

Descriptive  name  of  parameter 

D1000 

1000  mb  geopotential  height 

D925 

925  mb  geopotential  height 

D850 

850  mb  geopotential  height 

D700 

700  mb  geopotential  height 

D50  0 

500  mb  geopotential  height 

D400  * 

400  mb  geopotential  height 

D300  * 

300  mb  geopotential  height 

D250  * 

250  mb  geopotential  height 

TAIR 

Surface  air  temperature 

T100  0 

1000  mb  temperature 

T9  2  5 

925  mb  temperature 

T700 

700  mb  temperature 

T500 

500  mb  temperature 

T400  * 

400  mb  temperature 

T300  * 

300  mb  temperature 

T250  * 

250  mb  temperature 

EAIR 

Surface  vapor  pressure 

E1000 

1000  mb  vapor  pressure 

E925 

925  mb  vapor  pressure 

E850 

850  mb  vapor  pressure 

E700 

700  mb  vapor  pressure 

E500 

500  mb  vapor  pressure 

UBLW 

Boundary  layer  zonal  wind  component 

U1000 

1000  mb  zonal  wind  component 

U925 

925  mb  zonal  wind  component 
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U850 
U700 
U500 
U400  * 
U300  * 
U250  * 
VBLW 

V1000 

V925 

V850 

V700 

V500 

V400  * 

V300  * 

V250  * 

VOR925  ** 

VOR500  ** 

PS 

SMF 

PBLD 

STRTFQ 

STRTTH 

SHF 

ENTRN 

DRAG  ** 


850  mb  zonal  wind  component 

700  mb  zonal  wind  component 

500  mb  zonal  wind  component 

400  mb  zonal  wind  component 

300  mb  zonal  wind  component 

250  mb  zonal  wind  component 

Boundary  layer  meridional  wind 
component 

1000  mb  meridional  wind  component 

925  mb  meridional  wind  component 

850  mb  meridional  wind  component 

700  mb  meridional  wind  component 

500  mb  meridional  wind  component 

400  mb  meridional  wind  component 

300  mb  meridional  wind  component 

250  mb  meridional  wind  component 

925  mb  vorticity 

500  mb  vorticity 

Surface  pressure 

Surface  moisture  flux 

Planetary  boundary-layer  depth 

Percent  stratus  frequency 

Stratus  thickness 

Surface  heat  flux 

Entrainment  at  top  of  marine 
boundary-layer 

Drag  coefficient  (CD) 


Derived  parameters 


DTDP  Vertical  gradient  of  temperature 

DEDP  Vertical  gradient  of  vapor  pressure 

DUDP  Vertical  gradient  of  zonal  wind 

DVDP  Vertical  gradient  of  meridional  wind 

RH  Surface  relative  humidity 


BM1  ***  2.81132  +  (.16201  x  EAIR) 

-  ( .00237xE850)  -  (.0739  xT925) 

-  (.16179xE925) 


BM2  *** 


2.08302  +  (.36810  x TAIR) 

-  (.26675  x T1000)  -  ( .15980  x T925) 


BM3 

*  *  * 

3.00866  +  ( .11771  x  EAIR) 

-  (.01024  x  E850)  -  (.19321 

BM4 

**  * 

2.42235  -  (.000418  x UBLW) 

+  ( .000255  x  U700) 

BM5 

*** 

2.55859  -  (.000355  x  V1000) 

BM6 

*** 

2.57317  +  ( .000893  x  D1000) 
-  (.0000489  x  D700) 

BM7 

*** 

-15.2173  +  (.01764  x  PS) 

-  ( .01007  x STRTFQ)  +  (. 02642  x STRTTH) 
+  ( .06042  x  SHF) 


*  Parameters  which  were  not  used  due  to  their  being 
considered  as  having  little  likelihood  of  being 
important  in  forecasting  marine  visibility. 

**  Parameters  which  were  not  used  due  to  loss  of 
significant  digits  during  transfer  from  tape 
to  mass  storage. 

***  Linear  regression  equation  parameters. 


APPENDIX  E 


SKILL  AND  THREAT  SCORES 


1  2  3 

OBSERVED 


Total  =  R+S  +  T  +  U  +  V  +  W+  X  +  Y+  Z 

PI  -  (R+U+X) /Total  P3  =  (T+W+Z) /Total 

P2  =  (S+V+Y) /Total  PN  =  greatest  of  Pi,  P2  or  P3 

Raw  scores 

AO  =  %  correct  =  (X+v+T) /Total 

Al  =  1  -  class  error  =  (U+S+Y+W) /Total 

TS1  =  Threat  score  for  visibility  category  I 
=  X/ ( R+U+X+ Y+  Z ) 

TS2  =  Threat  score  for  visibility  category  II 
=  V/ (U+X+V+Y+W) 

TS12  =  Threat  score  for  visibility  categories  I  and  II 
=  (X+V) / (Total-T) 

TS12  is  designed  to  represent  the  skill  of  forecasting  visi¬ 
bility  categories  I  and  II  as  separate  categories,  rather 
than  their  skill  as  a  combined  category,  which  would  be 
(U+V+X+Y)/ (Total-T) . 
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APPENDIX  F 


TABLES 


TABLE  I.  A  SUMMARY  OF  THE  OBSERVATIONS  (PERCENTAGE 

FREQUENCIES)  OF  THREE  VISIBILITY  CATEGORIES 
(VISCAT'S) ,  FOR  THE  NORTH  ATLANTIC  OCEAN 
HOMOGENEOUS  AREAS  SHOWN  IN  FIG.  1,  15  MAY- 
15  JULY  1983 


NUMBER  OF 


AREA 

OBERSERVATIONS 

VISCAT  I 

VISCAT  II 

VISCAT  III 

1 

2725 

16  3 

(.06) 

436 

(.16) 

2126 

(.78) 

2 

2867 

277 

(.10) 

317 

(.11) 

2273 

(.79) 

3E 

131 

8 

(.06) 

31 

(.24) 

92 

(.70) 

3W 

2288 

437 

(.19) 

284 

(.12) 

1567 

(.68) 

4 

4771 

129 

(.03) 

597 

(.13) 

4045 

(.85) 

5E 

1087 

9 

(.01) 

94 

(.09) 

984 

(.91) 

5W 

2307 

8 

(.003) 

40 

(.02) 

2259 

(.98) 

6N 

580 

19 

(.03) 

45 

(.08) 

516 

(.89) 

6M 

2337 

21 

(.01) 

131 

(.06) 

2185 

(.93) 

6S 

60 

1 

(.02) 

2 

(.03) 

57 

(.95) 

7 

801 

7 

(.01) 

34 

(.04) 

760 

(.95) 

8 

12  84 

1 

(.001) 

27 

(  .02) 

1256 

(.98) 

ENTIRE  NORTH  ATLANTIC  AND  MEDITERRANEAN 


21,238 


1080  (.05)  2038  (.10)  18,120  (.85) 


d 


,  ,v/.  \  . 


TABLE  II.  NUMBER  OF  OBSERVATIONS  (PERCENTAGE  FREQUENCIES) 
OF  THREE  VISIBILITY  CATEGORIES  (VISCAT’S), 

AND  95%  CONFIDENCE  INTERVALS  FOR  THE 
DEPENDENT  AND  INDEPENDENT  DATA,  FOR  THE  NORTH 
PACIFIC  OCEAN  AND  AREA  3W  OF  THE  NORTH 
ATLANTIC  OCEAN 


North  Pacific  Ocean,  July  1979 


TOTAL  #  OF 


VI SCAT  I 

VISCAT  II 

VISCAT  III 

OBSERVA7 

95%  Cl 

.207-. 229 

.126-. 144 

.635-. 660 

Dependent  data 

816  (.222) 

498  (.135) 

2368  (.643) 

3682 

Independent  data 

388  (.211) 

246  (.134) 

1207  (.656) 

1841 

Total 

1204  (.218) 

744  (.135) 

3575  (.647) 

5523 

North  Atlantic  Ocean  area  3W, 

FATJUN  L9  83 

95%  Cl 

.175-.207 

.111-. 138 

.666-. 704 

Dependent  data 

296  (.194) 

190  (.125) 

1040  (.682) 

1526 

Independent  data 

141  (.185) 

94  (.123) 

527  (.692) 

762 

Total 

437  (.191) 

284  (.124) 

1567  (.685) 

2288 

TABLE  III. 


THE  INITIAL  FIVE  BEST  PREDICTORS  FOR 
EPI'S  OF  FOUR  THROUGH  TEN,  FOR  EACH 
STRATEGY,  WITH  ASSOCIATED  PP,  ao ,  ai 
AND  CE  VALUES  FROM  THE  NORTH  PACIFIC 
OCEAN  DEPENDENT  DATA,  JULY  1979 


Maximum-probability  Natural-regression 


EPI  Predictor 


CLIMO 


CLIMO 


CLIMO 


CLIMO 


CLIMO 


684  .1 

35 

.497 

.491 

.467 

.551 

681  .1 

35 

.503 

.478 

.475 

.569 

680  .1 

35 

.505 

.482 

.468 

.568 

657  .1 

35 

.551 

.471 

.478 

.580 

64  9  .1 

35 

.567 

.508 

.442 

.542 

697  .1 

35 

.471 

.435 

.538 

.592 

688  .1 

35 

.4  89 

.535 

.400 

.530 

678  .1 

35 

.509 

.539 

.396 

.526 

658  .1 

35 

.549 

.449 

.518 

.584 

658  .1 

35 

.549 

.418 

.549 

.615 

695  .1 

35 

.475 

.491 

.467 

.551 

690  .1 

35 

.485 

.478 

.475 

.569 

673  .1 

35 

.519 

.574 

.349 

.503 

661  .1 

35 

.54  3 

.508 

.442 

.542 

659  .1 

35 

.547 

.471 

.478 

.580 

693  .1 

35 

.479 

.529 

.415 

.527 

685  .1 

35 

.495 

.523 

.417 

.537 

675  .1 

35 

.515 

.523 

.417 

.537 

661  .1 

35 

.54  3 

.435 

.528 

.602 

659  .1 

35 

.54  7 

.308 

.654 

.730 

688  .1 

35 

.4  89 

.491 

.467 

.551 

681  .1 

35 

.503 

.478 

.475 

.569 

680  .1 

35 

.505 

.553 

.377 

.517 

663  .1 

35 

.539 

.404 

.567 

.625 

657  .1 

35 

.551 

.508 

.441 

.543 

\  . 
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TABLE 

III  (CONT.) 

9 

EHF 

.340 

.693 

.135 

.479 

.522 

.425 

.531 

SEHF 

.322 

.686 

.135 

.493 

.514 

.429 

.543 

FTER 

.324 

.683 

.135 

.499 

.574 

.349 

.503 

CLIMO 

.299 

.663 

.135 

.539 

.443 

.516 

.598 

RH 

.315 

.657 

.135 

.551 

.476 

.482 

.566 

10 

EFH 

.341 

.696 

.135 

.473 

.491 

.467 

.551 

SEHF 

.323 

.688 

.135 

.489 

.534 

.401 

.531 

FTER 

.322 

.678 

.135 

.509 

.539 

.396 

.526 

CLIMO 

.300 

.662 

.135 

.541 

.418 

.549 

.615 

RH 

.316 

.658 

.135 

.549 

.508 

.441 

.543 

K~, 
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« 
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TABLE  IV.  FIRST-STAGE  CONTINGENCY  TABLE  STATISTICS 
AO,  TS1 ,  AAO  AND  ATS1  FOR  BOTH  DEPENDENT 
AND  INDEPENDENT  NORTH  PACIFIC  OCEAN,  JULY 
1979,  DATA,  FOR  EPI'S  OF  FOUR  THROUGH  TEN 
AND  THE  MAXIMUM-PROBABILITY  STRATEGY,  WITH 
EHF  AS  THE  FIRST  PREDICTOR  FOR  EACH* NUMBER 
OF  EPI  'S 


Dependent  data  Independent  data 


EPI 

AO 

TSl 

AAO 

ATSl 

AO 

TSl 

AAO 

ATSl 

4 

.684 

.36 

.113 

.17 

.686 

.34 

.087 

.16 

5 

.697 

.35 

.150 

.17 

.695 

.33 

.114 

.15 

6 

.695 

.32 

.145 

.13 

.696 

.30 

.117 

.12 

7 

.69  3 

.30 

.139 

.10 

.69  3 

.28 

.107 

.09 

8 

.688 

.27 

.126 

.06 

.694 

.27 

.110 

.08 

9 

.693 

.36 

.139 

.17 

.695 

.34 

.114 

.16 

10 

.696 

.35 

.149 

.17 

.695 

.33 

.114 

.15 

i- 

f 

k‘ 

\ 

L 

k 

\ 

k 
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TABLE  V.  FD{96) ,  FD,  RSS  FD  AND  aQ  FOR  STRATEGY 

MAXPROB2 ,  NORTH  PACIFIC  OCEAN,  JULY  1979, 
DEPENDENT  DATA,  FOR  THOSE  PREDICTORS 
SELECTED  AT  EACH  STAGE  OF  THE  DEVELOPMENTAL 
MODEL  USING  FIVE  EPI’S.  FD(96)  IS  COM¬ 
PUTED  FROM  100  RANDOMLY  GENERATED  DATA  SETS, 
AS  EXPLAINED  IN  APPENDIX  A,  AND  PROVIDES 
A  MEASURE  OF  HOW  MUCH  ADDITIONAL  PREDICTA¬ 
BILITY  MAY  BE  EXPECTED  FROM  THE  INCLUSION 
OF  A  NEW  PREDICTOR.  IDEALLY,  RSS  FD 
SHOULD  BE  LESS  THAN  FD(9  6) 


FD,  of  predictor  added,  on 

Predictor 


added 

FD  (96) 

EHF 

DDWW 

H510 

RH 

RSS  FD 

0 

EHF 

- 

- 

- 

- 

- 

- 

.697 

DDWW 

.1399 

.1494 

- 

- 

- 

.1494 

.699 

H510 

.1978 

.2488 

.2185 

- 

- 

.3311 

.704 

RH 

.2423 

.2606 

.2087 

.1515 

- 

.3666 

.746 

THF 

.2798 

.  3290 

.1464 

.1678 

.1907 

.4408 

.820 

CLIMO 

.3128 

.3558 

.1727 

.1823 

.2551 

* 

.882 

*RSS 

FD  was  not 

computed  for 

CLIMO 

as  the 

choice  for 

the  sixth  predictor  was  between  only  CLIMO  and  SEHF . 
It  was  more  economical  to  compute  contingency  table 
statistics  for  each  and  to  choose  the  best  predictor 
from  those  results. 
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TABLE  VI.  CONTINGENCY  TABLES  AND  RELATED  STATISTICS  FOR 
BOTH  DEPENDENT  (3682  OBSERVATIONS)  AND 
INDEPENDENT  (1841  OBSERVATIONS)  NORTH  PACIFIC 
OCEAN,  JULY  1979,  DATA,  FROM  STAGE  FOUR  OF 
THE  DEVELOPMENTAL  MODEL.  PREDICTORS  ARE  EHF , 
DDWW,  H510  AND  RH,  EACH  DIVIDED  INTO  FIVE 
EPI’S,  FOR  (A)  MAXPROB1,  (B)  MAXPROB2  AND 
(C)  NATURAL-REGRESSION 


(a)  MAXPROBl 


DEPENDENT  DATA 


3 

h- 

tS) 

316 

301 

2198 

< 

<■>  2 
UJ 

tr 

29 

79 

29 

O 

u. 

1 

471 

118 

141 

1 

2 

3 

AO  = 

.75 

AAO  = 

.29 

A1  = 

.13 

TS1  = 

.44 

• 

ATS1  = 

.28 

TS2  = 

.14 

ATS2  = 

.01 

TS12= 

.37 

ATS12  = 

.02 

OBSERVED 


INDEPENDENT  DATA 


175 

162 

1065 

24 

26 

35 

58  107 


1  2  3 

OBSERVED 


AO  = 
A1  = 
TS1  = 
TS2  = 
TS12  = 


AAO=  .12 


ATS1 =  .17 

ATS  2=  --06 


ATS12=  ■•10 


kttt 


TABLE  VI  (CONT.) 


(b)  MAXPROB2 


DEPENDENT  DATA 


238  2077 


152  228 


1  2  3 

OBSERVED 


INDEPENDENT  DATA 


135  136  1007 


81  152 


ij)BS  ER 


2  3 

:  R  VE  D 


AO  =  .75 

A1  =  .13 

TS1 =  .47 
TS2=  ,18 
TS12=.42 


AAO  = 


ATS1 =  -32 
ATS2=  .06 
ATS12= .10 


II 

o 

< 

.69 

AAO  = 

.09 

A1  = 

.16 

TS1  = 

.37 

ATS1  = 

.20 

TS2= 

.09 

ATS  2  = 

-.05 

TS12  = 

.31 

ATS12= 

-.05 

i  "  jk  *  *»*»*■'  •  *  •  *  *  %  '  k  *  k**  t 

aivliivlVlA'jWllL \a>\  V.  m*. 


TABLE  VI  (CONT.) 

(c)  Natural-Regression 


DEPENDENT  DATA 


171  1773 


279  565 


12  3 

OBSERVED 


INDEPENDENT  DATA 


91  857 


128  298 


1  2  3 

OBSERVE  D 


AO  = 

.62 

AAO  = 

-.06 

A1  = 

.35 

TS1  = 

.27 

ATS1  = 

.06 

TS2  = 

.18 

ATS2  = 

.05 

TS12  = 

.27 

ATS12= 

-.13 

AO  = 

.58 

AAO  = 

-.21 

A1  = 

.35 

TS1  = 

.19 

ATS1  = 

-.02 

TS  2= 

.17 

ATS2  = 

.04 

TS12  = 

.22 

ATS12= 

-.19 
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TABLE  VII.  LINEAR-REGRESSION  EQUATION  FOR  THE  PREDICTED 
VALUE  OF  THE  VISIBILITY  CATEGORY  (Y) ,  Y 
STATISTICS  WITH  RESPECT  TO  THE  ACTUAL  VISI¬ 
BILITY  CATEGORIES  (Y)  AND  THRESHOLD  VALUES 
FROM  THE  EQUAL-VARIANCE  ASSUMPTION  MODEL, 
NORTH  PACIFIC  OCEAN,  JULY  1979.  NOTATION 
IS  AS  IN  APPENDIX  B. 


y  =  3.78586  +  ,04118(EHF)  -  .91412(FTER)  -  .01592(RH) 


Class  conditional  distributions  (i.e.,  distribution  of  y  for 
a  given  y) . 


X. 


Number  of 
observations 


Frequency 

of 


Mean  Value 


Standard 
deviation  of 


1 

816 

.222 

2.077 

(m1) 

.34  8 

2 

498 

.135 

2.263 

(m2) 

.382 

3 

2368 

.64  3 

2.568 

<m3> 

.353 

T1  =  threshold  between  y  =  1  and  y  =  2  =  2.506 
T2  =  threshold  between  y  =  2  and  y  =  3  =  1.768 
T^  =  threshold  between  y  =  1  and  y  =  3  =  2.048 


State  conditional  distributions  for  visibility  category  I 
(y  =  1) ,  II  (y  =  2)  and  III  (y  =  3)  depicting  threshold 
values  and  means. 


V  •.  V  V 


TABLE  VIII.  CONTINGENCY  TABLES  AND  RELATED  STATISTICS 
FROM  LINEAR  REGRESSION,  FOR  BOTH  DEPENDENT 
(3682  OBSERVATIONS)  AND  INDEPENDENT  (1841 
OBSERVATIONS)  NORTH  PACIFIC  OCEAN,  JULY 
1979,  DATA 


DEPENDENT  DATA 


389  342  2131 


427  156  237 


TS2  =0 . 0 


J  i 


1  2  3 

OBSERVED 


INDEPENDENT  DATA 


176  1076 


199  70  131 

1  2  3 
OBSERVE  D 


ATS12= 


AO  = 

.69 

AAO  = 

A1  = 

.13 

TS1  = 

.34 

ATS1  = 

TS2=  o 

.0 

ATS2  = 

TS12  = 

.26 

ATS12  = 

TABLE  IX.  THE  INITIAL  FIVE  BEST  PREDICTORS  FOR  EPI'S 
OF  FOUR  THROUGH  TEN,  FOR  EACH  STRATEGY, 
WITH  ASSOCIATED  PP,  aQ  ,  a-,  AND  CE  VALUES 
FROM  THE  NORTH  ATLANTIC  OCEAN  AREA  3W 
DEPENDENT  DATA,  15  MAY-15  JULY  1983, 
WITHOUT  LINEAR-REGRESSION  EQUATIONS  AS 
PREDICTORS 


Maximum-probability  Natural-regression 


EPI 

Pred ictor 

PP 

ao 

al 

CE 

ao 

al 

CE 

4 

E850 

.372 

.697 

.125 

.482 

.514 

.446 

.526 

SHF 

.376 

.691 

.125 

.493 

.512 

.455 

.521 

DTDP 

.344 

.685 

.125 

.505 

.611 

.304 

.474 

E925 

.359 

.685 

.125 

.505 

.505 

.453 

.537 

SMF 

.334 

.682 

.125 

.511 

.606 

.301 

.487 

5 

E925 

.367 

.702 

.125 

.472 

.564 

.379 

.494 

E850 

.375 

.700 

.125 

.475 

.576 

.370 

.478 

DTDP 

.344 

.699 

.125 

.477 

»  .528 

.4  09 

.535 

SHF 

.379 

.698 

.125 

.479 

.567 

.383 

.483 

SMF 

.337 

.686 

'  .125 

.503 

.526 

.409 

.539 

6 

DTDP 

.353 

.710 

.125 

.456 

.568 

.360 

.503 

E850 

.  374 

.699 

.125 

.477 

.609 

.324 

.458 

SMF 

.341 

.699 

.125 

.477 

.563 

.360 

.514 

E925 

.363 

.695 

.125 

.485 

.595 

.334 

.476 

SHF 

.374 

.693 

.125 

.489 

.512 

.455 

.521 

7 

DTDP 

.356 

.716 

.125 

.443 

.514 

.429 

.54  2 

SMF 

.34  8 

.706 

.125 

.463 

.590 

.325 

.495 

E850 

.379 

.699 

.125 

.477 

.561 

.389 

.4  89 

E925 

.364 

.692 

.125 

.491 

.547 

.400 

.506 

SHF 

.376 

.691 

.125 

.493 

.548 

.407 

.497 

8 

SMF 

.352 

.714 

.125 

.448 

.543 

.386 

.528 

DTDP 

.356 

.712 

.125 

.451 

.611 

.304 

.474 

E850 

.378 

.700 

.125 

.475 

.588 

.355 

.469 

SHF 

.379 

.691 

.125 

.493 

.512 

.455 

.521 

E925 

.364 

.685 

.125 

.505 

.577 

.360 

.486 

98 


TABLE  X.  FIRST-STAGE  CONTINGENCY  TABLE  STATISTICS  AO, 
TSl ,  AAO  AND  ATSl  FOR  BOTH  DEPENDENT  AND 
INDEPENDENT  NORTH  ATLANTIC  OCEAN  AREA  3W, 

15  MAY-15  JULY  1983,  DATA,  FOR  EPI'S  OF  FOUR 
THROUGH  TEN  AND  THE  MAXIMUM-PROBABILITY 
STRATEGY,  WITHOUT  LINEAR-REGRESSION  EQUATIONS 
AS  PREDICTORS 


Dependent 


Independent 


EPI 

Best 

Predictor 

AO 

TSl 

AAO 

ATSl 

AO 

TSl 

AAO 

ATSl 

4 

E850 

.70 

.32 

.05 

.15 

.69 

.30 

-.01 

.14 

5 

E925 

.70 

.30 

.06 

.13 

.71 

.30 

.05 

.14 

6 

DTDP 

.71 

.32 

.09 

.15 

.71 

.29 

.05 

.13 

7 

DTDP 

.72 

.31 

.11 

.14 

.71 

.28 

.07 

.11 

8 

SMF 

.71 

.28 

.10 

.10 

.73 

.29 

.13 

.13 

9 

SMF 

.71 

.26 

.10 

• 

o 

00 

.73 

.26 

.11 

.09 

10 

SMF 

.71 

.26 

.09 

.08 

.  73 

.24 

.15 

.06 

TABLE  XI.  SAME  AS  TABLE  IX,  EXCEPT  WITH  LINEAR- 
REGRESSION  EQUATIONS  AS  PREDICTORS 


Maximum-probability 


Natural -regress ion 


Predictor 


62 

.282 

.394 

65 

.270 

.400 

16 

.455 

.512 

12 

.461 

.515 

14 

.446 

.526 

89 

.380 

.442 

90 

.374 

.446 

66 

.387 

.482 

64 

.393 

.4  80 

64 

.379 

.494 

28 

.332 

.413 

25 

.328 

.422 

04 

.338 

.453 

17 

.454 

.512 

68 

.360 

.503 

50 

.303 

.397 

75 

.39  3 

.457 

54 

.406 

.486 

80 

.505 

.536 

14 

.429 

.542 

06 

.358 

.431 

01 

.358 

.440 

85 

.364 

.466 

75 

.378 

.472 

43 

.386 

.528 

/>  W- 


TABLE  XIV.  FD(96),  FD,  RSS  FD  AND  aQ  FOR  STRATEGY 

MAXPROB2,  NORTH  ATLANTIC  OCEAN  AREA  3W,  15 
MAY-15  JULY  1983,  DEPENDENT  DATA,  WITHOUT 
LINEAR-REGRESSION  EQUATIONS  AS  PREDICTORS, 

FOR  THOSE  PREDICTORS  SELECTED  AT  EACH  STAGE 
OF  THE  DEVELOPMENTAL  MODEL  USING  FIVE  EPI'S. 
FD(96)  IS  COMPUTED  FROM  100  RANDOMLY  GENERATED 
DATA  SETS,  AS  EXPLAINED  IN  APPENDIX  A,  AND 
PROVIDES  A  MEASURE  OF  HOW  MUCH  ADDITIONAL 
PREDICTABILITY  MAY  BE  EXPECTED  FROM  THE 
INCLUSION  OF  A  NEW  PREDICTOR.  IDEALLY,  RSS 
FD  SHOULD  BE  LESS  THAN  FD(96)  . 


FD,  of  predictor  added,  on 


Predictor 

Added 


FD (96 )  E925  U700  DVDP  STRTFQ  ENTRN  RSS  FD 


TABLE  XVI.  SAME  AS  TABLE  XV,  EXCEPT  FOR  EIGHT  EPI ' 


FD 

O 

rH 

in 

o  g 
H  0 

o 

rH 

cn 

I  00 

00 

m  i-i 

w 

o 

rH 1 

rH 

CM 

Q+J£ 

« 

• 

• 

• 

• 

—  CO 

.'k.-  .V  - 


TABLE  XVII. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS  FOR 
BOTH  DEPENDENT  (1526  OBSERVATIONS)  AND  INDE¬ 
PENDENT  (762  OBSERVATIONS)  NORTH  ATLANTIC 
OCEAN  AREA  3W,  15  MAY-15  JULY  1983,  DATA, 
WITHOUT  LINEAR-REGRESSION  EQUATIONS  AS 
PREDICTORS,  FROM  STAGE  FIVE  OF  THE  DEVELOP¬ 
MENTAL  MODEL.  PREDICTORS  ARE  SMF ,  D850, 

RH,  UBLW  AND  ENTRN,  EACH  DIVIDED  INTO  EIGHT 
EPI'S,  FOR  (a)  MAXPROBl,  (b)  MAXPR0B2  AND 
(c)  NATURAL-REGRESSION 


(a)  MAXPROBl 


DEPENDENT  DATA 


AO  =  .98 

AAO  = 

.95 

f — f 

o 

• 

II 

< 

TS1 =  .95 

ATS1  = 

.94 

TS2=  -91 

ATS2  = 

.90 

TS12=  -95 

ATS12= 

.92 

1  2  3 

OBSERVED 


INDEPENDENT  DATA 


1  2  3 
OBSERVED 


AO  = 

.70 

AAO  = 

.04 

A1  = 

.16 

TS1  = 

.34 

ATS1  = 

.19 

TS2= 

.15 

ATS2  = 

.03 

TS12  = 

.27 

ATS12= 

-.05 

c?  y.v.viv  > 


, v >_  *  *  .  ,*yy ,  h*  .  /y .  -•  j  -  M  w 


(CONT.) 


AO  =  -98 

A1=  .01 

TS1 =  .95 
TS2=  .92 
TS12=  -95 


AO  *  .66 

A1=  .19 

TS1  =  .33 

TS2=  .14 
TS12=  .27 


AAO  =  *95 

ATS1 = .94 
ATS2=  *9° 
ATS12=52 


AAO=  _.io 

ATS1=  .18 
ATS  2  s  .02 

ATS12=--05 


>\.v 


m 


K:- 


I 


TABLE  XVI I  ( CONT . ) 


(c)  Natural-Regression 


DEPENDENT  DATA 


w 

< 

O  2 
UJ 

cr 

O 

“■  1 


10 

1031 

15 

179 

9 

281 

1 

0 

1  2  3 

OBSERVED 


AO  = 

.98 

AAO  = 

.93 

A1  = 

.02 

TS1  = 

.95 

ATS1  = 

.93 

TS2  = 

.84 

ATS2  = 

.81 

TS12  = 

.93 

ATS12= 

.90 

INDEPENDENT  DATA 


3 

</> 

54 

56 

407 

< 

Z.2 

a 

A 

30 

28 

91 

u. 

1 

57 

10 

29 

1 

2 

3 

OBSERVE  D 


> 

O 

« 

.65 

AAO=  - 

.15 

A1  = 

.25 

TS1  = 

.32 

ATS1  = 

.16 

TS2= 

.13 

ATS  2  = 

.01 

TS12  = 

.24 

ATS12=  - 

.10 
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TABLE  XVIII.  SAME  AS  TABLE  XVII,  EXCEPT  FOR  FIVE 

EPI'S.  PREDICTORS  ARE  E925,  U700,  DVDP 
STRTFQ  AND  ENTRN 


(a)  MAXPROB1 


DEPENDENT  DATA 


1  2  3 

OBSERVED 


AO  =  .92 
A1=  .05 

TS1 =  .77 
TS2=  *63 
TS12= . 75 


92 

AAO  =  .  74 

05 

77 

ATS1 =  .71 

63 

ATS2=  -57 

75 

ATS12= .63 

INDEPENDENT  DATA 


1  2  3 

OBS  E R  VE  D 


AO  = 

.72 

AAO  = 

.0! 

A1  = 

.16 

TS1  = 

.35 

ATS1  = 

.2 

TS2  = 

.14 

ATS2  = 

.0 

TS12  = 

.29 

ATS12= 

-.0 

table; 

XVIII  (CONT.) 

(b) 

MAXPR0B2 

DEPENDENT  DATA 

11 

12 

970 

2 

14  8 

36 

1  2  3 

OBSERVED 


TS12= . 78  ATS12 


INDEPENDENT  DATA 


49  426 


TS12=  .32  ATS12 


2  3 

OBSERVED 


DEPENDENT  DATA 


TABLE  XVIII  (CONT.) 


(c)  Natural-Regression 


1  2  3 

OBSERVED 


INDEPENDENT  DATA 


AO  = 

.88 

AAO  = 

.63 

A1  = 

.12 

TS1  = 

.72 

ATS1  = 

.65 

TS2  = 

.44 

ATS2  = 

.36 

TS12  = 

.51 

ATS12  = 

.28 

AO  = 

.68 

AAO=  - 

.05 

A1  = 

.23 

TS1  = 

.34 

ATS1  = 

.19 

TS2= 

.15 

ATS2  = 

.03 

TS12=  .27 


2  3 

OBSERVE  D 


•  *  ’  at  ■  *  »  *  «  ' 

'  O  •  *  •  '  *  ^  - • 


ATS12= -.05 


.vvv.v 


i 

ip:- 

fe¬ 


es 


S; 

S' 


v.-. 

s: 


1  ■-*, 

■>: 


■e- 


kS- 


% 


LvV 


TABLE  XIX.  CONTINGENCY  TABLES  AND  RELATED  STATISTICS 
FOR  BOTH  DEPENDENT  (1526  OBSERVATIONS)  AND 
INDEPENDENT  (762  OBSERVATIONS)  NORTH  ATLANTIC 
OCEAN  AREA  3W,  15  MAY -15  JULY  1983,  DATA, 

WITH  LINEAR-REGRESSION  EQUATIONS  AS  PREDICTORS, 
FROM  STAGE  FOUR  OF  THE  DEVELOPMENTAL  MODEL. 
PREDICTORS  ARE  BMl ,  U850 ,  D500  AND  V850, 

EACH  DIVIDED  INTO  FOUR  EPI'S,  FOR  (a)  MAXPROBl, 
(b)  MAXPROB2  AND  (c)  NATURAL-REGRESSION 


(a)  MAXPROBl 


DEPENDENT  DATA 


(/) 

<  _ 

°  2 
IX) 

cc 

o 


97 

120 

990 

6 

21 

5 

193 

49 

45 

1  2  3 

OBSERVED 


INDEPENDENT  DATA 


3 

V) 

< 

a 

O 


1  2  3 

OBSERVED 


45 

74 

499 

4 

5 

4 

92 

15 

24 

AO  = 

.79 

AAO  = 

.34 

A1  = 

.12 

TS1  - 

.50 

ATS1  = 

.37 

TS2  = 

.10 

ATS2=- 

.02 

TS12  = 

.40 

ATS12* 

.12 

AO  = 

• 

CO 

AAO  = 

.29 

A1  = 

.13 

TS 1  = 

.51 

ATS1  = 

.40 

TS2= 

.05 

ATS2  = 

-.09 

TS12  = 

.37 

ATS12= 

.09 

114 


TABLE  XIX  (CONT.) 


(b)  MAXPROB2 


DEPENDENT  DATA 


77 

109 

967 

3 

21 

9 

216 

60 

64 

1  2  3 

OBSERVED 


9  AAO  =  .34 

.2 

'1  ATS1  =  .40 

.0  ATS2=-.02 

.1  TS12=  .42  ATS12 


INDEPENDENT  DATA 


36 

68 

481 

3 

8 

6 

102 

18 

40 

OBSERVED 


TABLE  XX.  SAME  AS  TABLE  XIX,  EXCEPT  RESULTS  ARE  FROM 
STAGE  TWO  IN  THE  DEVELOPMENTAL  MODEL  AND 
PREDICTORS  ARE  DIVIDED  INTO  EIGHT  EPI'S 
EACH.  PREDICTORS  ARE  BMl  AND  U500 


(a)  MAXPROB1 


DEPENDENT  DATA 


1  2  3 

OBSERVED 


INDEPENDENT  DATA 


t  2  3 

OBS  ER  VE  D 


AO  = 

.75 

AAO  = 

.23 

A1  = 

.13 

TS1  = 

.43 

ATS1  = 

.29 

TS2  = 

VO 

o 

• 

ATS2  =- 

.07 

TS12  = 

.33 

ATS  12= 

.02 

AO  * 

.75 

AAO  = 

.17 

A1  = 

.13 

TS1  = 

.43 

ATS1  = 

.30 

TS2=  0 

.0 

ATS 2=  - 

.14 

TS12  = 

.30 

ATS12  =  - 

.01 

DEPENDENT  DATA 


TABLE  XX  (CONT.) 


(b)  MAXPROB2 


118 

943 

6 

4 

66 

93 

AO  = 

.75 

AAO  = 

.23 

A1  = 

.13 

TS1  = 

.45 

ATS1  = 

.31 

TS2  = 

.03 

A  T  S  2  =- 

.11 

TS123 

.36 

ATS12= 

.06 

12  3 

OBSERVED 


INDEPENDENT  DATA 


AO  = 

.74 

AAO  = 

.16 

A1  = 

.13 

TS1  = 

.44 

AT  SI  = 

.32 

TS2=0 

.0 

ATS2= ~ 

.14 

TS12  = 

.33 

ATS12= 

.02 

2  3 

OBSERVE  D 


TABLE  XXI.  LINEAR-REGRESSION  EQUATIONS  FOR  THE  PREDICTED 
VALUE  OF  THE  VISIBILITY  CATEGORY  (Y) ,  FOR  BOTH 
REGRESSION  METHODS,  Y  STATISTICS  WITH  RESPECT 
TO  THE  ACTUAL  VISIBILITY  CATEGORIES  (Y)  AND 
THRESHOLD  VALUES  FROM  BOTH  THRESHOLD  MODELS, 
NORTH  ATLANTIC  OCEAN  AREA  3W,  15  MAY-15  JULY 
1983.  NOTATION  IS  AS  IN  APPENDIX  B 


Definitions : 


Linear  regression  method  1:  single  equation, 
three  visibility  categories 

-  Linear  regression  method  2:  Decision-tree;  two 
equations,  two  visibility  categories  each 

All  predictors  were  made  available  to  the 
regression  model. 

Only  the  best  predictors  from  the  Preisendorfer 
(1983  a,b,c)  methodology  were  made  available 
to  the  regression  model 

-  Quadratic  threshold  model  (Case  III,  Appendix  B) 

-  Equal  variance  threshold  model  (Case  I,  Appendix  B) 


B.  LRla 


y  =  2.81132  +  .16201 (EAIR)  -  .00237(E850)  -  .07319(T925) 


-  . 16179 (E925) 


Class  conditional  distributions  (i.e.,  the  distribution  of  y 
for  a  given  y) . 


Number  of 
observations 
of  y 


Frequency 

of 

Y  (P) 


Mean  value 
of 

y  (m) 


Standard 
deviation  of 
y  (a) 


1040 


.194 


.125 


.682 


2.014  (m1) 
2.324  (m2) 
2.652  (m3) 


.4  34 


.379 


120 


TABLE  XXI  (CONT.) 


LRlaA 


=  threshold  between  y  =  1  and  y  =  2  =  2.275 
=  threshold  between  y  -  2  and  y  =  3  =  1.839 
=  threshold  between  y  =  1  and  y  =  3  =  2.008 


(second  threshold  value,  of  the  pair,  was  of  no  interest 
See  Appendix  B) 


LRlaB 


T  =  threshold  between  y 
a  1 


=  1  and  y  =  2  =  2.368 


=  threshold  between  y  =  2  and  y  =  3  =  1.76  8 


T  =  threshold  between  y 
c 


-  1  and  y  =  3  =  2.060 


State  conditional  distributions  for  visibility  category 
I  (y  =  1) ,  II  (y  =  2)  and  III  (y  =  3)  depicting 
threshold  values  and  means. 


Predicted  value  (y) 


TABLE  XXI  (CONT.) 


C.  LR2a 

A  . 

Equation  1:  y  =  .90305  +  .06122  (EAIR)  +  .11284  xl0~4(D850) 

-  . 0 84 38  (E850 )  -  .04083(T925) 


Class  conditiona  distributions 


z 

Number  of 
observations 
of  y 

Frequency 

of 

y  (p) 

Mean 

of 

Y  (m) 

value 

Standard 
Deviation 
of  y  (a) 

0 

486 

.318 

.479 

(m0) 

.222 

1 

1040 

.682 

.776 

(n^) 

.209 

LR2aA:  T1  = 

threshold  between  y 

=  0 

and 

y  =  1  =  .4979 

LR2aBj  T 

a 

threshold  between  y 

=  0 

and 

y  =  1  =  .5110 

State  conditional  distributions  for  combined  visibility 
categories  I  and  II  (y  =  0)  and  visibility  category  III 
(y  =  1)  depicting  threshold  values  and  means 


122 


TABLE  XXI  (CONT.) 


Equation  2:  y  =  .01229  -  .18917  x  lo"3  (U1000) 

-  .020  88 (T500)  +  .1339  x  10~3  (U500) 

+  .15259  x 10-4 (D925)  -  . 32705  x 10~2 (STRTFQ) 
+  7.50153 (DEDP)  -  .03279 (DVDP) 

Class  conditional  distributions 

Number  of  Frequency  Mean  value  Standard 

observations  of  of  deviation 

£  of  y _  y  (p)  y  (m) _  of  y  (a) 


0 

296 

.609 

.319 

(m0) 

.186 

1 

190 

.391 

.503 

(n^) 

.194 

LR2aA: 

1! 

f — 1 

E-i 

threshold 

between  y 

=  0  and 

y  =  1  = 

.5102 

LR2aB: 

T 

a 

threshold 

between  y 

=  0  and 

y  -  1  = 

.4972 

State  conditional  distributions  for  visibility  category  I 
(y  =  0)  and  II  (y  =  1)  depicting  threshold  values  and  means. 


Predicted  value  (y) 


TABLE  XXI  (CONT.) 


D.  LR2b 

Equation  1:  y  =  .89952  -  .04830(E850)  +  .02472(SHF) 

+  2 .17081 (DTDP)  +  6. 81684 (DEDP) 


Class  conditional 

Number  of 
observations 
Z  of  y _ 

distributions 

Frequency  Mean 

of  of 

y  (p)  y  (m) 

value 

Standard 
deviation 
of  y  (a) 

0 

486 

.318  .496 

(m0) 

.220 

1 

1040 

.682  .768 

(m1) 

.201 

LR2bA: 

T1  " 

threshold  between  y  =  0 

and  y  = 

1  =  .4922 

LR2bB : 

T 

a 

threshold  between  y  =  0 

and  y  = 

1  =  .5119 

State  conditional  distributions  for  visibility  categories 
I  and  II  (y  =  0)  and  visibility  category  III  (y  =  1) 
depicting  threshold  values  and  means. 


TABLE  XXIII.  SAME  AS  TABLE  XXII,  EXCEPT  USING  THE 
EQUAL -VARIANCE  THRESHOLD  MODEL 


LRlaB  (Table  XXI) 


DEPENDENT  DATA 


1  2  3 

OBSERVED 


INDEPENDENT  DATA 


1  2  3 
OBSERVE  D 


AO  = 

.75 

AAO  = 

CM 

CM 

• 

A1  = 
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TABLE  XXV.  SAME  AS  TABLE  XXIV,  EXCEPT  USING  THE  EQUAL- 
VARIANCE  THRESHOLD  MODEL 
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TABLE  XXVI. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS  FROM 
LINEAR  REGRESSION  METHOD  2  (DECISION-TREE) , 
QUADRATIC  THRESHOLD  MODEL,  FOR  BOTH  DEPENDENT 
(1526  OBSERVATIONS)  AND  INDEPENDENT  (762 
OBSERVATIONS)  NORTH  ATLANTIC  OCEAN  AREA  3W, 

15  MAY-15  JULY  1983,  DATA,  WITH  ONLY  THOSE 
PREDICTORS  IDENTIFIED  AS  BEST  BY  THE 
PREISENDORFER  METHODOLOGY  AVAILABLE  TO  THE 
REGRESSION  MODEL 
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TABLE  XXVII.  SAME  AS  TABLE  XXVI,  EXCEPT  USING  THE 
EQUAL-VARIANCE  THRESHOLD  MODEL 


LR2bB  (Table  XXI) 
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eneous  Areas  for  the  North  Atlantic  Ocean 
and  July,  from  Lowe  (1984b) 


COUflLLr  POPULOUS  INTERVALS 


2a.  The  behavior  of  contingency  table  statistics 
for  dependent  (AO — dashes,  TSl — solid)  and 
independent  (AO — chaindots,  TSl — chaindashes) 
data,  as  the  number  of  EPI's  is  varied,  for 
the  North  Atlantic  Ocean  area  3W,  15  May-15 
July  1983,  when  predictors  are  chosen  based  upon 
the  maximum  increase  of  Sq  in  the  dependent 
data,  for  (a)  a  single  predictor  (SMF) ,  (b)  two 
predictors,  (c)  three  predictors,  (d)  four 
predictors,  and  (e)  five  predictors.  Numbers 
in  parentheses  represent  the  number  of  EPI's 
which  was  fixed  for  the  indicated  parameter  so 
that  the  number  of  EPI's  for  the  next  predictor 
could  be  varied. 
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EQUALLY  POPULOUS  INTERVALS 


Fxg.  3d.  Sane  as  Fig.  2a,  except  predictors,  after  tae 
first,  are  selected  by  haying  the  lowest  RSS  FU  ror  (a) 
two  predictors  (S?!F(6J  and  EH),  (c)  three  predictors, 
(c)  foar  predictors,  (d)  rive  creuictors,  aac  (e)  six 
predictors. 
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NUMBER  OR  INTERVALS 


The  behavior  of  functional  dependence  (FD)  as 
determined  from  100  randomly  generated  data  sets 
(Preisendorfer,  1983c)  for  EPI's  of  two  through  ten 
for  (a)  the  North  Atlantic  Ocean  area  3W,  15  May- 
15  July  1983,  dependent  data  (1526  observations) 
and  (b)  the  North  Pacific  Ocean,  July  1979,  dependent 
data  (3682  observations).  Plotted  are  FD(96) 

(upper  dashed) ,  FD(05) ,  (lower  dashed) ,  mean  FD 
(solid)  and  standard  deviation  (x  100)  (chaindashes) 

140 
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NUMBER  OF  INTERVALS 


First  stage  contingency  table  statistics  AAO, 
dependent  data  (solid) ,  and  ATSl,  independent  data 
(dashed).  North  Pacific  Ocean,  July  1979,  as  a 
function  of  the  number  of  EPI's,  from  the  Preisen- 
dorfer  (1983  a,b)  methodology.  EHF  is  the  predictor 
for  all  EPI's. 


OEP.OflTfi:  flflO-SOLlO,  RTS 1 -DOTS 
INOEP.DRTfl:  ftRO-DRSrCS,  RTSl-CHRINCflSHES 


NUMBER  OF  INTERVALS 


Same  as  Fig.  5,  except  for  the  North  Atlantic 
Ocean  area  3W,  15  May-15  July  1983.  BMl  is  the 
predictor  for  all  EPI's. 


CRITICAL  LEVEL  STATISTICS  „  CRITICAL  LEVEL  STATISTICS 


FORECAST  ARRAY  LENGTH 


Same  as  Fig.  8,  except  each  predictor  is  divided 
into  eight  EPI’s. 
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Fig.  10.  Contingency  table  statistics  AAO  and  ATSl  for  both 
dependent  and  independent  North  Atlantic  Ocean  area 
3W,  15  May-15  July  1983,  data,  without  linear- 
regression  equations  as  predictors,  as  a  function 
of  the  number  of  predictors  in  the  model  for 
strategies  (a)  MAXPROB1  and  (b)  MAXPROB2 .  Pre¬ 
dictors  are  SMF,  D850,  RH ,  UBLW  and  ENTRN,  each 
divided  into  eight  EPI's.  Negative  values  are  not 
plotted. 
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DE°.DRTfl:  RRO-SOLIO,  RTS  1 -DOTS 
INDEP.DRTR:  RRO-GRSHES,  RTS  1  -CHR I NDRSHES 


2  Z  i 

NUMBER  Or  PREDICTORS 


3EP.ORTR:  RflO-SOLIQ,  RTS! -DOTS 
INOEP.ORTfl:  RRO-DRSHES,  RTS 1 -CHR I Nr' 


DEP.DflTRi  RAO-SOLID,  RTS 1 -DOTS 
INDEP.DRTR:  RflO-ORSHES,  RTS  1 -CHR I NDRSHES 


2  3  4 

NUMBER  OF  PREOICTORS 


_  I  OEP.ORTR:  HRO-SOLIO,  RTSI-OCTS 

®-|  INOEP.ORTR:  RRO-ORSHES,  RTS  I -CHR  I  NDRSHES 


O  1  2  3  4  5  6 

NUMBER  OF  PREDICTORS 

Fig.  12.  Contingency  table  statistics  AAO  and  ATSl  for  both 
dependent  and  independent  North  Atlantic  Ocean  area 
3W,  15  May-15  July  1983,  data,  with  linear  regression 
equations  as  predictors,  as  a  function  of  the  number 
of  predictors  in  the  model  for  strategies  (a)  MAX- 
PROB1  and  (b)  MAXPROB2.  Predictors  are  BMl,  U850, 
D500,  V850,  D1000  and  U1000,  each  divided  into  four 
EPI's. 
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Fig.  14.  Bivariate  plot  of  EHF  as  a  function  of  both 

equally  populous  intervals  (EPI)  and  visibility 
categories  (VISCAT) . 
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Joint  and  marginal  probabilities  of  VISCAT' s  as  a 
function  of  EPI ' s  for  EHF. 
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Fig.  16.  Conditional  probabilities  of  VISCAT's  as  a 
function  of  EPI's  for  EHF . 
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Fig.  17.  Sample  calculation  of  the  average  visibility 

category  (VISCAT) ,  natural-regression  strategy 
for  the  first  EPI  (i  =  1)  of  predictor  EHF. 
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4.  18.  Saaple  calculation  of  potential  predictability 
(PP)  of  visibility  by  predictor  Z'di. 
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21b.  Reduction  of  the  tnree-dimeris  ional  pro-lea,  in 
Fxg.  21a.,  to  tuo  dxaensions. 
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Fig.  23.  Sample  calculation  of  functional  dependence  (FD) 
of  H LI  on  iKF . 
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